Translation Demo
This project demonstrates how to use the OpenAI Realtime API to build a one-way translation application with WebSockets. It is implemented using the Realtime + Websockets integration. A real-world use case for this demo is multilingual, conversational translation—where a speaker talks into the speaker app and listeners hear translations in their selected native languages via the listener app. Imagine a conference room with multiple participants with headphones, listening live to a speaker in their own languages. Due to the current turn-based nature of audio models, the speaker must pause briefly to allow the model to process and translate speech. However, as models become faster and more efficient, this latency will decrease significantly and the translation will become more seamless.
How to Use
Running the Application
-
Set up the OpenAI API:
- If you're new to the OpenAI API, sign up for an account.
- Follow the Quickstart to retrieve your API key.
-
Clone the Repository:
git clone <repository-url>
-
Set your API key:
- Create a
.env
file at the root of the project and add the following line:REACT_APP_OPENAI_API_KEY=<your_api_key>
- Create a
-
Install dependencies:
Navigate to the project directory and run:
npm install
-
Run the Speaker & Listener Apps:
npm start
The speaker and listener apps will be available at:
-
Start the Mirror Server:
In another terminal window, navigate to the project directory and run:
node mirror-server/mirror-server.mjs
Adding a New Language
To add a new language to the codebase, follow these steps:
-
Socket Event Handling in Mirror Server:
- Open
mirror-server/mirror-server.cjs
. - Add a new socket event for the new language. For example, for Hindi:
socket.on('mirrorAudio:hi', (audioChunk) => { console.log('logging Hindi mirrorAudio', audioChunk); socket.broadcast.emit('audioFrame:hi', audioChunk); });
- Open
-
Instructions Configuration:
- Open
src/utils/translation_prompts.js
. - Add new instructions for the new language. For example:
export const hindi_instructions = "Your Hindi instructions here...";
- Open
-
Realtime Client Initialization in SpeakerPage:
- Open
src/pages/SpeakerPage.tsx
. - Import the new language instructions:
import { hindi_instructions } from '../utils/translation_prompts.js';
- Add the new language to the
languageConfigs
array:const languageConfigs = [ // ... existing languages ... { code: 'hi', instructions: hindi_instructions }, ];
- Open
-
Language Configuration in ListenerPage:
-
Open
src/pages/ListenerPage.tsx
. -
Locate the
languages
object, which centralizes all language-related data. -
Add a new entry for your language. The key should be the language code, and the value should be an object containing the language name.
const languages = { fr: { name: 'French' }, es: { name: 'Spanish' }, tl: { name: 'Tagalog' }, en: { name: 'English' }, zh: { name: 'Mandarin' }, // Add your new language here hi: { name: 'Hindi' }, // Example for adding Hindi } as const;
-
The
ListenerPage
component will automatically handle the new language in the dropdown menu and audio stream handling.
-
-
Test the New Language:
- Run your application and test the new language by selecting it from the dropdown menu.
- Ensure that the audio stream for the new language is correctly received and played.
Demo Flow
-
Connect in the Speaker App:
- Click "Connect" and wait for the WebSocket connections to be established with the Realtime API.
- Choose between VAD (Voice Activity Detection) and Manual push-to-talk mode.
- the speaker should ensure they pause to allow the translation to catch up - the model is turn based and cannot constantly stream translations.
- The speaker can view live translations in the Speaker App for each language.
-
Select Language in the Listener App:
- Select the language from the dropdown menu.
- The listener app will play the translated audio. The app translates all audio streams simultaneously, but only the selected language is played. You can switch languages at any time.