From 985d09d110b9d9dee0573eebd27c2c9797ad7c94 Mon Sep 17 00:00:00 2001
From: erikakettleson-openai <erikakettleson@openai.com>
Date: Tue, 25 Mar 2025 12:55:44 -0700
Subject: [PATCH] One way translation - update images  (#1735)

---
 .../one_way_translation_using_realtime_api.mdx  | 17 +++++++++++------
 registry.yaml                                   |  1 -
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/examples/voice_solutions/one_way_translation_using_realtime_api.mdx b/examples/voice_solutions/one_way_translation_using_realtime_api.mdx
index d84103b..ce833b5 100644
--- a/examples/voice_solutions/one_way_translation_using_realtime_api.mdx
+++ b/examples/voice_solutions/one_way_translation_using_realtime_api.mdx
@@ -10,13 +10,14 @@ A real-world use case for this demo is a multilingual, conversational translatio
 Let's explore the main functionalities and code snippets that illustrate how the app works. You can find the code in the [accompanying repo](https://github.com/openai/openai-cookbook/tree/main/examples/voice_solutions/one_way_translation_using_realtime_api/README.md
 ) if you want to run the app locally.
 
-### High Level Architecture Overview
+## High Level Architecture Overview
 
 This project has two applications - a speaker and listener app. The speaker app takes in audio from the browser, forks the audio and creates a unique Realtime session for each language and sends it to the OpenAI Realtime API via WebSocket. Translated audio streams back and is mirrored via a separate WebSocket server to the listener app. The listener app receives all translated audio streams simultaneously, but only the selected language is played. This architecture is designed for a POC and is not intended for a production use case. Let's dive into the workflow!
 
-![Architecture](translation_images/Realtime_flow_diagram.png)
+![Architecture](https://github.com/openai/openai-cookbook/blob/main/examples/voice_solutions/translation_images/Realtime_flow_diagram.png?raw=true)
+
+## Step 1: Language & Prompt Setup
 
-### Step 1: Language & Prompt Setup
 
 We need a unique stream for each language - each language requires a unique prompt and session with the Realtime API. We define these prompts in  `translation_prompts.js`.
 
@@ -37,7 +38,8 @@ const languageConfigs = [
 
 ## Step 2: Setting up the Speaker App 
 
-![SpeakerApp](translation_images/SpeakerApp.png) 
+![SpeakerApp](https://github.com/openai/openai-cookbook/blob/main/examples/voice_solutions/translation_images/SpeakerApp.png?raw=true) 
+
 
 We need to handle the setup and management of client instances that connect to the Realtime API, allowing the application to process and stream audio in different languages. `clientRefs` holds a map of `RealtimeClient` instances, each associated with a language code (e.g., 'fr' for French, 'es' for Spanish) representing each unique client connection to the Realtime API. 
 
@@ -94,7 +96,8 @@ const connectConversation = useCallback(async () => {
   };
 ```
 
-### Step 3: Audio Streaming 
+## Step 3: Audio Streaming 
+
 
 Sending audio with WebSockets requires work to manage the inbound and outbound PCM16 audio streams ([more details on that](https://platform.openai.com/docs/guides/realtime-model-capabilities#handling-audio-with-websockets)). We abstract that using wavtools, a library for both recording and streaming audio data in the browser. Here we use `WavRecorder` for capturing audio in the browser.
 
@@ -114,7 +117,9 @@ const startRecording = async () => {
   };
   ```
 
-### Step 4: Showing Transcripts
+
+## Step 4: Showing Transcripts
+
 
 We listen for `response.audio_transcript.done` events to update the transcripts of the audio. These input transcripts are generated by the Whisper model in parallel to the GPT-4o Realtime inference that is doing the translations on raw audio. 
 
diff --git a/registry.yaml b/registry.yaml
index a193d88..596c29b 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -1847,4 +1847,3 @@
     - audio
     - speech
 
-