openai-cookbook/examples/GPT_with_vision_for_video_understanding.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Processing and narrating a video with GPT-4.1-mini's visual capabilities and GPT-4o TTS API\n",
    "\n",
    "This notebook demonstrates how to use GPT's visual capabilities with a video. Although GPT-4.1-mini doesn't take videos as input directly, we can use vision and the 1M token context window to describe the static frames of a whole video at once. We'll walk through two examples:\n",
    "\n",
    "1. Using GPT-4.1-mini to get a description of a video\n",
    "2. Generating a voiceover for a video with GPT-4o TTS API\n"
   ]
  },
  {
   "cell_type": "code",
<<<<<<< HEAD
   "execution_count": 2,
=======
   "execution_count": 46,
>>>>>>> main
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import display, Image, Audio\n",
    "\n",
    "import cv2  # We're using OpenCV to read video, to install !pip install opencv-python\n",
    "import base64\n",
    "import time\n",
    "from openai import OpenAI\n",
    "import os\n",
    "\n",
    "client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Using GPT's visual capabilities to get a description of a video\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we use OpenCV to extract frames from a nature [video](https://www.youtube.com/watch?v=kQ_7GtE529M) containing bisons and wolves:\n"
   ]
  },
  {
   "cell_type": "code",
<<<<<<< HEAD
   "execution_count": 3,
=======
   "execution_count": null,
>>>>>>> main
   "metadata": {},
   "outputs": [],
   "source": [
    "video = cv2.VideoCapture(\"data/bison.mp4\")\n",
    "\n",
    "base64Frames = []\n",
    "while video.isOpened():\n",
    "    success, frame = video.read()\n",
    "    if not success:\n",
    "        break\n",
    "    _, buffer = cv2.imencode(\".jpg\", frame)\n",
    "    base64Frames.append(base64.b64encode(buffer).decode(\"utf-8\"))\n",
    "\n",
    "video.release()\n",
    "print(len(base64Frames), \"frames read.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Display frames to make sure we've read them in correctly:\n"
   ]
  },
  {
   "cell_type": "code",
<<<<<<< HEAD
   "execution_count": 4,
=======
   "execution_count": null,
>>>>>>> main
   "metadata": {},
   "outputs": [],
   "source": [
    "display_handle = display(None, display_id=True)\n",
    "for img in base64Frames:\n",
    "    display_handle.update(Image(data=base64.b64decode(img.encode(\"utf-8\"))))\n",
    "    time.sleep(0.025)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we have the video frames, we craft our prompt and send a request to GPT (Note that we don't need to send every frame for GPT to understand what's going on):\n"
   ]
  },
  {
   "cell_type": "code",
<<<<<<< HEAD
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Witness an intense and gripping wildlife encounter in the heart of a snowy wilderness. This extraordinary video captures a fearless pack of wolves as they courageously surround and confront a mighty bison. As the icy wind sweeps across the barren landscape, watch the raw power and strategic teamwork of the wolves unfold in a dramatic struggle for survival. The video showcases nature's harsh realities and the delicate balance of predator and prey, highlighting the wolves' determination and the bison's formidable strength. Prepare for an unforgettable glimpse into the relentless dance of life and death in the wild. Don’t miss this captivating moment of nature’s untamed drama!\n"
     ]
    }
   ],
=======
   "execution_count": null,
   "metadata": {},
   "outputs": [],
>>>>>>> main
   "source": [
    "response = client.responses.create(\n",
    "    model=\"gpt-4.1-mini\",\n",
    "    input=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"input_text\",\n",
    "                    \"text\": (\n",
    "                        \"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.\"\n",
    "                    )\n",
    "                },\n",
    "                *[\n",
    "                    {\n",
    "                        \"type\": \"input_image\",\n",
    "                        \"image_url\": f\"data:image/jpeg;base64,{frame}\"\n",
    "                    }\n",
    "                    for frame in base64Frames[0::25]\n",
    "                ]\n",
    "            ]\n",
    "        }\n",
    "    ],\n",
    ")\n",
    "\n",
    "print(response.output_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Generating a voiceover for a video with GPT-4.1 and the GPT-4o TTS API\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a voiceover for this video in the style of David Attenborough. Using the same video frames we prompt GPT to give us a short script:\n"
   ]
  },
  {
   "cell_type": "code",
<<<<<<< HEAD
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In the vast, unforgiving expanse of the winter tundra, a stealthy pack of wolves encircles their formidable prey: the mighty bison. With calculated precision, they close in, working as a cohesive unit to isolate their target from the herd. Though the bison stands its ground, outnumbered and pressured, the relentless wolves persist, their survival dependent on this crucial hunt. Each movement tells a story of nature's balance—predator and prey locked in an age-old dance for existence under the stark skies. Here, in this frozen wilderness, life and death hang in a delicate balance, governed by the instinct to endure.\n"
     ]
    }
   ],
=======
   "execution_count": null,
   "metadata": {},
   "outputs": [],
>>>>>>> main
   "source": [
    "new_result = client.responses.create(\n",
    "    model=\"gpt-4.1-mini\",\n",
    "    input=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"input_text\",\n",
    "                    \"text\": (\n",
    "                        \"These are frames of a video. Create a short voiceover script in the style of David Attenborough. Only include the narration.\"\n",
    "                    )\n",
    "                },\n",
    "                *[\n",
    "                    {\n",
    "                        \"type\": \"input_image\",\n",
    "                        \"image_url\": f\"data:image/jpeg;base64,{frame}\"\n",
    "                    }\n",
    "                    for frame in base64Frames[0::60]\n",
    "                ]\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    ")\n",
    "\n",
    "print(new_result.output_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can work with the GPT-4o TTS model and provide it a set of instructions on how the voice should sound. You can play around with the voice models and instructers at [OpenAI.fm](openai.fm). We can then pass in the script we generated above with GPT-4.1-mini and generate audio of the voiceover:\n"
   ]
  },
  {
   "cell_type": "code",
<<<<<<< HEAD
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "                <audio  controls=\"controls\" >\n",
       "                    <source src=\"data:audio/wav;base64,UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf/////5//b/9f/1//T/8f/y//L/8//z//X/9P/x//H/8f/x//H/7//x//L/8f/z//X/8v/z//T/9P/0//T/9P/3//n/+f/5//r/+v/7//7//f/+//3/AAAAAAEAAAAAAP////8AAAAAAgADAAQABAADAAcABgAHAAgACAAIAAUAAQD9/wAAAAABAP3/+v/4//X/9//3//T/8P/w/+3/7f/s/+r/6v/r/+3/6//p/+f/5//o/+n/6v/t/+7/7//y//b/9//9////AwAIAA0ADQATABkAGwAeACQAKgAtADMANgA2AD4AQABEAEUARwBGAEcASgBLAEwASQBHAEUAQwBBADwANgAvACoAKAAgABwAFQAOAA0AAgD+//n/8//w/+v/5v/i/9r/1f/T/9P/0f/T/9T/z//N/87/y//P/9D/1f/Z/9z/4f/d/9//3P/e/+L/5P/r/+b/5v/l/+H/5P/g/+b/5f/d/+D/2f/W/8v/xv/J/8r/0P/N/8X/w/+7/7r/uf+3/7z/vv/C/8X/vP/A/8H/xP/M/9b/3//k/+f/6P/o//T//f8GAA0AFAAUABsAHQAkACUALQAzADcAPAA6ADgANgA2AEAAQQBIAEsASwBQAEoATQBIAEkASgBJAFMAVABUAFMATgBLAFIAWwBhAGUAXwBZAFMATQBSAE0AUABMAEEAOQAbABwABgAFAAoA+P/7/9D/uv+o/4v/mv9//4//fv9m/23/NP9G/yz/N/9j/0f/ef9T/1r/YP9H/4T/d/+z/8r/0P8BAOX/FAARADAAcABxAMcAzwDpAAcB+AAoATkBbQGiAbMB5AHRAdgBzQGzAd4B1gEGAg8C/gHkAZ4BlAF6AX0BoQF9AYEBUgEZAfcA2QDWALoAwwChAH0AYAAYAA8A9f/b/97/sP+l/2//V/83/yX/JP/n/g//wP60/rv+Xv6E/kD+K/4i/v394/3G/Yf9Wv0s/ev89vy7/Lz8jPwf/Cf8oPuO+377E/tz+936APvy+nT6x/r9+Qb6wfmz+aD6rPpf+xP7Ovo/+sf5s/op/C79Y/4d/qH93vzY/Ir9Vv+0ASED7wPQAuAB8gCqAYADOQaXCLwIwQiXBnAFfQVYBvgIkwqIC5EK1QggB78F6gUFBwUI9Qj2B8AGqARwAugBXgGWAgoDCwOnArQALv/E/U79Df4Y/7T/IAD5/pn9v/wY/Ab9pf3R/sr+y/4n/qb90f1r/XT+3f43ANAAHwFGAaQAZAFGAk4D7wQDBVsFeAZjBuAHaQhMCDsJKAkGCjILVguDC3oL6wpjC4ALEQszC/gKtAqnCgEKXAhSB8QGWQYKB3AGyANAAk3/sv+fAA4AkgBk/P/5d/j495X6Cfvt+qj43vUt9Xv0+vYZ+Hj3Dvib9oX2/fZ49Yz2RvX+9cL4IPmc+g74vvX49Df1qPfT+LX5dvge90b3uPdu9672TPTu8+n0lfby+An35vTE8BDvuO9t8bXz4fMc9JDyTfJB8Y/wBvBY8aD0W/g++6X7g/lP91H35fl8/yUCogS7BHcDIwWyBFQGRAfwB2MKjgyXDvAOyw1ZDN8LqgzWDh8QQRGdECcQ+Q4jDZ8NTwwBDNcMlQueC/UJegh2Bx8GUQUaBOIDwwKQAnIC5wDcAFL/ff59/g/9i/0Y/mL9Yf4a/jH9l/z3+p77GP1e/iX/BACB/1X+cv5Y/xQBagKPA3cFAQbHBfkFZgaQB+cI2Ao+DGcNyQzrDNoNgA2hDQkO+g7dD1QQbxCcD3YNJAx9CxgMmAwjDKwL0wmKCPsFZgQMBA8C4QFnAcAACABS/bH7/fmm9673mvdz+AX5z/Yl9ezydvJs8m7ztvR+8xPzKfII8sbyqvGi8WHy8/GZ9B30uvJj8nPwivIU9Bj1zvUp87jwnO8E8HTxGPIE8zHzffEG72XrSesE7O7u0vSr9VD0vO4m6kvsKvE+96/8CP0d+Rj3Cve2+oz+mQHmAsEELAWeBZoHgAW/BuoHdgpGDnYOrg2nDfsL6AutDo4PFhBtD3oOaw6+DWEMxgyLDNoKvgpaCZsIbAgqBisGjAV1A7ECUwJ9ArgBtwGnADkAf/9S/i7/gv+i/5IA5v+W/tP+8P3U/6YBRQLiAt4BtQEdAvYDsAVyB3IIfAhFCXQJfwm7C94MxA3yD64P0Q+aD8AOXQ88ENgQGhHkEOEPGw/LDdgMnwwdC/kKqwrlCbMIrQYTBAQCHQHBANYA5v81/kX8YvrW9+j3Qvfn9lL32fW+9Qr0VfIV8kPyzPKM8sTyzvIA8cXwLfBt8O7x1/Bv8g3y3+9j8T3wKvBw8PzvPfC28UnzIfJ+8AvtMulC67DvMvFG9KbwHuze6H7lCeta7t7vNPTm8ybxcfC67Q3usPI69/78nQDKALn8A/o9+1n+fQNPB3gK+wm+B1MIDAgoCPgKdQugDdMPrQ9tD/QNWA38DJMNqg4MDwEPTQ5pDWUMOQrvCPsH0wewCGQILQeJBJQCwwHfAD4CbAIIAn0B0v80ADz/dv9bALkAKQJ6AVkBgwE6AewCjgMyBGIGQQUzBqUHQQdOCd8JigvPDGINEQ5KDu0PUxCUEYISGRJ+EhkSxRIvFMITLBNYEvgQShCpDxUQZA8oDqcMWgoyCtgHfAavBb0D6wKIAYIA4v6z/G/6kPmN+Jj3SPdv9qj1sfTq8sbxU/F38HLxBvKp8cXw/O4H7+zulu5373Xv6O+m7yvvue907sjunu6z75zwUu8Q8Jrvpu8f8ZDvsu5p7YTqje0I71Lx3/Dd7Kjr1+da6cTs4u8w81jynfGi8fnvkPHY8871BPy1/m0AOgEH/xD9Lv6sAokGuQriC2kKUgnLCIsIDwscDUcNxA9cDyMQxw/UDQAOCg1nDjwO9A62DwwOgAxhC2EJbAhfCLIH8QiXB4AGxwTqAg4CkwE+A4wDaQOiAXwA7wCmAdICFAOoA5ICsgFEA2oEKwZ5B0cHKQeXB4wH/wgHC7YLRQ3gDZwOPg/GDgsPyg9NEZIRYxNcFFMTdBJcEM8PbRHMEVcR2BG4DzcOEA1FC1ULCwmhCJYImQerBvkDngIFAKz9vv1S/X/9tvsX+tD4Jfcs9i30evQR9NTzZfQA9IjzavFH8DfwCPAx8F3xOPFC8TDwx+4Y78/tSu4l7f7v6/CH7+fvSuzF7PLrYexQ7+3vUe9V7VvqpukC6FfpDu2d7BTvBurF6GfocuQ/6yfuvfL89KXxS/Ew8JfyT/Ws+0EABQE3Ao8B/wHvAVsEbgZiCr0M6guqDQYMcAqKC3MN0w6fEGsPkg8CEJAOPw+7DiYP8g1gDZwN/gzzDMQKSAokCRsH5AYnBtcF7gQ6BMQDeQL0AfMBLQHVArYCdwJgAnUChwL4Ai0FpgSyBqMFNwabBt4GTgmJCVMLQwsnDK4LHQzVDEAO+Q/hEFcRzhCdEIMPfBB+EdoRnhLYEcwRLBAxDqcO8g3NDlYOOA0bDT0KTAh0B9IHHQe9BmYFtAMoAn//2v56/pP9xfwR/OX6c/li96T2YPax9XX2BvaV9q/1B/N18xPy0fHf8wHz3fMR85vxKPBp73LvmO9G72jwye+I79TuOO2c7q7rBu3167PsRu+I7WXtzOsJ67Lljufi58Popex/6vLr6ecw5WzjnOUg6jDu3PJ987ry0+/o70Lyg/d1+5oARgQxA9oC5AKnAlYFuQfGCogOpg0PDn8M/wxjDScNWhD3D1wRbw8bDygQww0XD9wOVw2vDbUKQApFCQEJ1wjQBzYI9ARZA3ICVgEcAskCzQOnA7gCmgIwAScBcwLkAoQF6QYDCOUHIwctBpIGYgjlCdwMxw08DscN1gwWDHcN/A49EEITExO9EukQwxAaEFQQUhJBEXASbREvEM8Qsw0YDawMiQu4DJUMMwtKCvkIYQZCBvcFZgV/BGsDBAM5AZgAFP/a/X/9n/yG+3j6XPpM+Wv4bPis9933IvbO9Vr
       "                    Your browser does not support the audio element.\n",
       "                </audio>\n",
       "              "
      ],
      "text/plain": [
       "<IPython.lib.display.Audio object>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
=======
   "execution_count": null,
   "metadata": {},
   "outputs": [],
>>>>>>> main
   "source": [
    "instructions = \"\"\"\n",
    "Voice Affect: Calm, measured, and warmly engaging; convey awe and quiet reverence for the natural world.\n",
    "\n",
    "Tone: Inquisitive and insightful, with a gentle sense of wonder and deep respect for the subject matter.\n",
    "\n",
    "Pacing: Even and steady, with slight lifts in rhythm when introducing a new species or unexpected behavior; natural pauses to allow the viewer to absorb visuals.\n",
    "\n",
    "Emotion: Subtly emotive—imbued with curiosity, empathy, and admiration without becoming sentimental or overly dramatic.\n",
    "\n",
    "Emphasis: Highlight scientific and descriptive language (“delicate wings shimmer in the sunlight,” “a symphony of unseen life,” “ancient rituals played out beneath the canopy”) to enrich imagery and understanding.\n",
    "\n",
    "Pronunciation: Clear and articulate, with precise enunciation and slightly rounded vowels to ensure accessibility and authority.\n",
    "\n",
    "Pauses: Insert thoughtful pauses before introducing key facts or transitions (“And then... with a sudden rustle...”), allowing space for anticipation and reflection.\n",
    "\"\"\"\n",
    "\n",
    "audio_response = response = client.audio.speech.create(\n",
    "  model=\"gpt-4o-mini-tts\",\n",
    "  voice=\"echo\",\n",
    "  instructions=instructions,\n",
    "  input=new_result.output_text,\n",
    "  response_format=\"wav\"\n",
    ")\n",
    "\n",
    "audio_bytes = audio_response.content\n",
    "Audio(data=audio_bytes)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								{
 								 "cells": [
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "# Processing and narrating a video with GPT-4.1-mini's visual capabilities and GPT-4o TTS API\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "\n",
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "This notebook demonstrates how to use GPT's visual capabilities with a video. Although GPT-4.1-mini doesn't take videos as input directly, we can use vision and the 1M token context window to describe the static frames of a whole video at once. We'll walk through two examples:\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "\n",
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "1. Using GPT-4.1-mini to get a description of a video\n",
 								    "2. Generating a voiceover for a video with GPT-4o TTS API\n"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								<<<<<<< HEAD
 								   "execution_count": 2,
 								=======
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "execution_count": 46,
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								>>>>>>> main
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "metadata": {},
 								   "outputs": [],
 								   "source": [
 								    "from IPython.display import display, Image, Audio\n",
 								    "\n",
-												Updating notebooks to use new Python SDK (#837)


											
										
										
											2023-11-14 13:31:13 -08:00
+								    "import cv2  # We're using OpenCV to read video, to install !pip install opencv-python\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "import base64\n",
 								    "import time\n",
-												Updating notebooks to use new Python SDK (#837)


											
										
										
											2023-11-14 13:31:13 -08:00
+								    "from openai import OpenAI\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "import os\n",
-												Updating notebooks to use new Python SDK (#837)


											
										
										
											2023-11-14 13:31:13 -08:00
+								    "\n",
-												Migrate all notebooks to API V1 (#914)

Co-authored-by: ayush rajgor <ayushrajgorar@gmail.com>
											
										
										
											2024-01-24 17:05:14 -08:00
+								    "client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"))"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## 1. Using GPT's visual capabilities to get a description of a video\n"
 								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												fix some typos (#829)


											
										
										
											2023-11-11 01:32:51 +08:00
+								    "First, we use OpenCV to extract frames from a nature [video](https://www.youtube.com/watch?v=kQ_7GtE529M) containing bisons and wolves:\n"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								<<<<<<< HEAD
 								   "execution_count": 3,
 								=======
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "execution_count": null,
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								>>>>>>> main
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "metadata": {},
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "outputs": [],
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "source": [
-												Add video link and update paths (#822)


											
										
										
											2023-11-06 13:05:53 -08:00
+								    "video = cv2.VideoCapture(\"data/bison.mp4\")\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "\n",
 								    "base64Frames = []\n",
 								    "while video.isOpened():\n",
 								    "    success, frame = video.read()\n",
 								    "    if not success:\n",
 								    "        break\n",
 								    "    _, buffer = cv2.imencode(\".jpg\", frame)\n",
 								    "    base64Frames.append(base64.b64encode(buffer).decode(\"utf-8\"))\n",
 								    "\n",
 								    "video.release()\n",
-												Fix audio output in GPT-V notebook (#908)


											
										
										
											2023-12-05 13:06:23 -08:00
+								    "print(len(base64Frames), \"frames read.\")"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "Display frames to make sure we've read them in correctly:\n"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								<<<<<<< HEAD
 								   "execution_count": 4,
 								=======
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "execution_count": null,
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								>>>>>>> main
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "metadata": {},
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "outputs": [],
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "source": [
 								    "display_handle = display(None, display_id=True)\n",
 								    "for img in base64Frames:\n",
 								    "    display_handle.update(Image(data=base64.b64decode(img.encode(\"utf-8\"))))\n",
-												Fix audio output in GPT-V notebook (#908)


											
										
										
											2023-12-05 13:06:23 -08:00
+								    "    time.sleep(0.025)"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												fix some typos (#829)


											
										
										
											2023-11-11 01:32:51 +08:00
+								    "Once we have the video frames, we craft our prompt and send a request to GPT (Note that we don't need to send every frame for GPT to understand what's going on):\n"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								<<<<<<< HEAD
 								   "execution_count": 5,
 								   "metadata": {},
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "Witness an intense and gripping wildlife encounter in the heart of a snowy wilderness. This extraordinary video captures a fearless pack of wolves as they courageously surround and confront a mighty bison. As the icy wind sweeps across the barren landscape, watch the raw power and strategic teamwork of the wolves unfold in a dramatic struggle for survival. The video showcases nature's harsh realities and the delicate balance of predator and prey, highlighting the wolves' determination and the bison's formidable strength. Prepare for an unforgettable glimpse into the relentless dance of life and death in the wild. Don’t miss this captivating moment of nature’s untamed drama!\n"
 								     ]
 								    }
 								   ],
 								=======
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "execution_count": null,
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "metadata": {},
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "outputs": [],
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								>>>>>>> main
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "source": [
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "response = client.responses.create(\n",
 								    "    model=\"gpt-4.1-mini\",\n",
 								    "    input=[\n",
 								    "        {\n",
 								    "            \"role\": \"user\",\n",
 								    "            \"content\": [\n",
 								    "                {\n",
 								    "                    \"type\": \"input_text\",\n",
 								    "                    \"text\": (\n",
 								    "                        \"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.\"\n",
 								    "                    )\n",
 								    "                },\n",
 								    "                *[\n",
 								    "                    {\n",
 								    "                        \"type\": \"input_image\",\n",
 								    "                        \"image_url\": f\"data:image/jpeg;base64,{frame}\"\n",
 								    "                    }\n",
 								    "                    for frame in base64Frames[0::25]\n",
 								    "                ]\n",
 								    "            ]\n",
 								    "        }\n",
 								    "    ],\n",
 								    ")\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "\n",
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "print(response.output_text)"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "## 2. Generating a voiceover for a video with GPT-4.1 and the GPT-4o TTS API\n"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "Let's create a voiceover for this video in the style of David Attenborough. Using the same video frames we prompt GPT to give us a short script:\n"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								<<<<<<< HEAD
 								   "execution_count": 6,
 								   "metadata": {},
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "In the vast, unforgiving expanse of the winter tundra, a stealthy pack of wolves encircles their formidable prey: the mighty bison. With calculated precision, they close in, working as a cohesive unit to isolate their target from the herd. Though the bison stands its ground, outnumbered and pressured, the relentless wolves persist, their survival dependent on this crucial hunt. Each movement tells a story of nature's balance—predator and prey locked in an age-old dance for existence under the stark skies. Here, in this frozen wilderness, life and death hang in a delicate balance, governed by the instinct to endure.\n"
 								     ]
 								    }
 								   ],
 								=======
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "execution_count": null,
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "metadata": {},
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "outputs": [],
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								>>>>>>> main
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "source": [
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "new_result = client.responses.create(\n",
 								    "    model=\"gpt-4.1-mini\",\n",
 								    "    input=[\n",
 								    "        {\n",
 								    "            \"role\": \"user\",\n",
 								    "            \"content\": [\n",
 								    "                {\n",
 								    "                    \"type\": \"input_text\",\n",
 								    "                    \"text\": (\n",
 								    "                        \"These are frames of a video. Create a short voiceover script in the style of David Attenborough. Only include the narration.\"\n",
 								    "                    )\n",
 								    "                },\n",
 								    "                *[\n",
 								    "                    {\n",
 								    "                        \"type\": \"input_image\",\n",
 								    "                        \"image_url\": f\"data:image/jpeg;base64,{frame}\"\n",
 								    "                    }\n",
 								    "                    for frame in base64Frames[0::60]\n",
 								    "                ]\n",
 								    "            ]\n",
 								    "        }\n",
 								    "    ]\n",
 								    ")\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    "\n",
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "print(new_result.output_text)"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "Now, we can work with the GPT-4o TTS model and provide it a set of instructions on how the voice should sound. You can play around with the voice models and instructers at [OpenAI.fm](openai.fm). We can then pass in the script we generated above with GPT-4.1-mini and generate audio of the voiceover:\n"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								<<<<<<< HEAD
 								   "execution_count": 7,
 								   "metadata": {},
 								   "outputs": [
 								    {
 								     "data": {
 								      "text/html": [
 								       "\n",
 								       "                <audio  controls=\"controls\" >\n",
 								       "                    <source src=\"data:audio/wav;base64,UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf/////5//b/9f/1//T/8f/y//L/8//z//X/9P/x//H/8f/x//H/7//x//L/8f/z//X/8v/z//T/9P/0//T/9P/3//n/+f/5//r/+v/7//7//f/+//3/AAAAAAEAAAAAAP////8AAAAAAgADAAQABAADAAcABgAHAAgACAAIAAUAAQD9/wAAAAABAP3/+v/4//X/9//3//T/8P/w/+3/7f/s/+r/6v/r/+3/6//p/+f/5//o/+n/6v/t/+7/7//y//b/9//9////AwAIAA0ADQATABkAGwAeACQAKgAtADMANgA2AD4AQABEAEUARwBGAEcASgBLAEwASQBHAEUAQwBBADwANgAvACoAKAAgABwAFQAOAA0AAgD+//n/8//w/+v/5v/i/9r/1f/T/9P/0f/T/9T/z//N/87/y//P/9D/1f/Z/9z/4f/d/9//3P/e/+L/5P/r/+b/5v/l/+H/5P/g/+b/5f/d/+D/2f/W/8v/xv/J/8r/0P/N/8X/w/+7/7r/uf+3/7z/vv/C/8X/vP/A/8H/xP/M/9b/3//k/+f/6P/o//T//f8GAA0AFAAUABsAHQAkACUALQAzADcAPAA6ADgANgA2AEAAQQBIAEsASwBQAEoATQBIAEkASgBJAFMAVABUAFMATgBLAFIAWwBhAGUAXwBZAFMATQBSAE0AUABMAEEAOQAbABwABgAFAAoA+P/7/9D/uv+o/4v/mv9//4//fv9m/23/NP9G/yz/N/9j/0f/ef9T/1r/YP9H/4T/d/+z/8r/0P8BAOX/FAARADAAcABxAMcAzwDpAAcB+AAoATkBbQGiAbMB5AHRAdgBzQGzAd4B1gEGAg8C/gHkAZ4BlAF6AX0BoQF9AYEBUgEZAfcA2QDWALoAwwChAH0AYAAYAA8A9f/b/97/sP+l/2//V/83/yX/JP/n/g//wP60/rv+Xv6E/kD+K/4i/v394/3G/Yf9Wv0s/ev89vy7/Lz8jPwf/Cf8oPuO+377E/tz+936APvy+nT6x/r9+Qb6wfmz+aD6rPpf+xP7Ovo/+sf5s/op/C79Y/4d/qH93vzY/Ir9Vv+0ASED7wPQAuAB8gCqAYADOQaXCLwIwQiXBnAFfQVYBvgIkwqIC5EK1QggB78F6gUFBwUI9Qj2B8AGqARwAugBXgGWAgoDCwOnArQALv/E/U79Df4Y/7T/IAD5/pn9v/wY/Ab9pf3R/sr+y/4n/qb90f1r/XT+3f43ANAAHwFGAaQAZAFGAk4D7wQDBVsFeAZjBuAHaQhMCDsJKAkGCjILVguDC3oL6wpjC4ALEQszC/gKtAqnCgEKXAhSB8QGWQYKB3AGyANAAk3/sv+fAA4AkgBk/P/5d/j495X6Cfvt+qj43vUt9Xv0+vYZ+Hj3Dvib9oX2/fZ49Yz2RvX+9cL4IPmc+g74vvX49Df1qPfT+LX5dvge90b3uPdu9672TPTu8+n0lfby+An35vTE8BDvuO9t8bXz4fMc9JDyTfJB8Y/wBvBY8aD0W/g++6X7g/lP91H35fl8/yUCogS7BHcDIwWyBFQGRAfwB2MKjgyXDvAOyw1ZDN8LqgzWDh8QQRGdECcQ+Q4jDZ8NTwwBDNcMlQueC/UJegh2Bx8GUQUaBOIDwwKQAnIC5wDcAFL/ff59/g/9i/0Y/mL9Yf4a/jH9l/z3+p77GP1e/iX/BACB/1X+cv5Y/xQBagKPA3cFAQbHBfkFZgaQB+cI2Ao+DGcNyQzrDNoNgA2hDQkO+g7dD1QQbxCcD3YNJAx9CxgMmAwjDKwL0wmKCPsFZgQMBA8C4QFnAcAACABS/bH7/fmm9673mvdz+AX5z/Yl9ezydvJs8m7ztvR+8xPzKfII8sbyqvGi8WHy8/GZ9B30uvJj8nPwivIU9Bj1zvUp87jwnO8E8HTxGPIE8zHzffEG72XrSesE7O7u0vSr9VD0vO4m6kvsKvE+96/8CP0d+Rj3Cve2+oz+mQHmAsEELAWeBZoHgAW/BuoHdgpGDnYOrg2nDfsL6AutDo4PFhBtD3oOaw6+DWEMxgyLDNoKvgpaCZsIbAgqBisGjAV1A7ECUwJ9ArgBtwGnADkAf/9S/i7/gv+i/5IA5v+W/tP+8P3U/6YBRQLiAt4BtQEdAvYDsAVyB3IIfAhFCXQJfwm7C94MxA3yD64P0Q+aD8AOXQ88ENgQGhHkEOEPGw/LDdgMnwwdC/kKqwrlCbMIrQYTBAQCHQHBANYA5v81/kX8YvrW9+j3Qvfn9lL32fW+9Qr0VfIV8kPyzPKM8sTyzvIA8cXwLfBt8O7x1/Bv8g3y3+9j8T3wKvBw8PzvPfC28UnzIfJ+8AvtMulC67DvMvFG9KbwHuze6H7lCeta7t7vNPTm8ybxcfC67Q3usPI69/78nQDKALn8A/o9+1n+fQNPB3gK+wm+B1MIDAgoCPgKdQugDdMPrQ9tD/QNWA38DJMNqg4MDwEPTQ5pDWUMOQrvCPsH0wewCGQILQeJBJQCwwHfAD4CbAIIAn0B0v80ADz/dv9bALkAKQJ6AVkBgwE6AewCjgMyBGIGQQUzBqUHQQdOCd8JigvPDGINEQ5KDu0PUxCUEYISGRJ+EhkSxRIvFMITLBNYEvgQShCpDxUQZA8oDqcMWgoyCtgHfAavBb0D6wKIAYIA4v6z/G/6kPmN+Jj3SPdv9qj1sfTq8sbxU/F38HLxBvKp8cXw/O4H7+zulu5373Xv6O+m7yvvue907sjunu6z75zwUu8Q8Jrvpu8f8ZDvsu5p7YTqje0I71Lx3/Dd7Kjr1+da6cTs4u8w81jynfGi8fnvkPHY8871BPy1/m0AOgEH/xD9Lv6sAokGuQriC2kKUgnLCIsIDwscDUcNxA9cDyMQxw/UDQAOCg1nDjwO9A62DwwOgAxhC2EJbAhfCLIH8QiXB4AGxwTqAg4CkwE+A4wDaQOiAXwA7wCmAdICFAOoA5ICsgFEA2oEKwZ5B0cHKQeXB4wH/wgHC7YLRQ3gDZwOPg/GDgsPyg9NEZIRYxNcFFMTdBJcEM8PbRHMEVcR2BG4DzcOEA1FC1ULCwmhCJYImQerBvkDngIFAKz9vv1S/X/9tvsX+tD4Jfcs9i30evQR9NTzZfQA9IjzavFH8DfwCPAx8F3xOPFC8TDwx+4Y78/tSu4l7f7v6/CH7+fvSuzF7PLrYexQ7+3vUe9V7VvqpukC6FfpDu2d7BTvBurF6GfocuQ/6yfuvfL89KXxS/Ew8JfyT/Ws+0EABQE3Ao8B/wHvAVsEbgZiCr0M6guqDQYMcAqKC3MN0w6fEGsPkg8CEJAOPw+7DiYP8g1gDZwN/gzzDMQKSAokCRsH5AYnBtcF7gQ6BMQDeQL0AfMBLQHVArYCdwJgAnUChwL4Ai0FpgSyBqMFNwabBt4GTgmJCVMLQwsnDK4LHQzVDEAO+Q/hEFcRzhCdEIMPfBB+EdoRnhLYEcwRLBAxDqcO8g3NDlYOOA0bDT0KTAh0B9IHHQe9BmYFtAMoAn//2v56/pP9xfwR/OX6c/li96T2YPax9XX2BvaV9q/1B/N18xPy0fHf8wHz3fMR85vxKPBp73LvmO9G72jwye+I79TuOO2c7q7rBu3167PsRu+I7WXtzOsJ67Lljufi58Popex/6vLr6ecw5WzjnOUg6jDu3PJ987ry0+/o70Lyg/d1+5oARgQxA9oC5AKnAlYFuQfGCogOpg0PDn8M/wxjDScNWhD3D1wRbw8bDygQww0XD9wOVw2vDbUKQApFCQEJ1wjQBzYI9ARZA3ICVgEcAskCzQOnA7gCmgIwAScBcwLkAoQF6QYDCOUHIwctBpIGYgjlCdwMxw08DscN1gwWDHcN/A49EEITExO9EukQwxAaEFQQUhJBEXASbREvEM8Qsw0YDawMiQu4DJUMMwtKCvkIYQZCBvcFZgV/BGsDBAM5AZgAFP/a/X/9n/yG+3j6XPpM+Wv4bPis9933IvbO9Vr
 								       "                    Your browser does not support the audio element.\n",
 								       "                </audio>\n",
 								       "              "
 								      ],
 								      "text/plain": [
 								       "<IPython.lib.display.Audio object>"
 								      ]
 								     },
 								     "execution_count": 7,
 								     "metadata": {},
 								     "output_type": "execute_result"
 								    }
 								   ],
 								=======
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "execution_count": null,
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "metadata": {},
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "outputs": [],
-												Updating vision cookbook to render properly (#1787)


											
										
										
											2025-04-23 11:00:40 -07:00
+								>>>>>>> main
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "source": [
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "instructions = \"\"\"\n",
 								    "Voice Affect: Calm, measured, and warmly engaging; convey awe and quiet reverence for the natural world.\n",
 								    "\n",
 								    "Tone: Inquisitive and insightful, with a gentle sense of wonder and deep respect for the subject matter.\n",
 								    "\n",
 								    "Pacing: Even and steady, with slight lifts in rhythm when introducing a new species or unexpected behavior; natural pauses to allow the viewer to absorb visuals.\n",
 								    "\n",
 								    "Emotion: Subtly emotive—imbued with curiosity, empathy, and admiration without becoming sentimental or overly dramatic.\n",
 								    "\n",
 								    "Emphasis: Highlight scientific and descriptive language (“delicate wings shimmer in the sunlight,” “a symphony of unseen life,” “ancient rituals played out beneath the canopy”) to enrich imagery and understanding.\n",
 								    "\n",
 								    "Pronunciation: Clear and articulate, with precise enunciation and slightly rounded vowels to ensure accessibility and authority.\n",
 								    "\n",
 								    "Pauses: Insert thoughtful pauses before introducing key facts or transitions (“And then... with a sudden rustle...”), allowing space for anticipation and reflection.\n",
 								    "\"\"\"\n",
 								    "\n",
 								    "audio_response = response = client.audio.speech.create(\n",
 								    "  model=\"gpt-4o-mini-tts\",\n",
 								    "  voice=\"echo\",\n",
 								    "  instructions=instructions,\n",
 								    "  input=new_result.output_text,\n",
 								    "  response_format=\"wav\"\n",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								    ")\n",
 								    "\n",
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								    "audio_bytes = audio_response.content\n",
 								    "Audio(data=audio_bytes)"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   ]
 								  }
 								 ],
 								 "metadata": {
 								  "kernelspec": {
-												Updating vision cookbook to 4.1 (#1783)


											
										
										
											2025-04-23 08:28:20 -07:00
+								   "display_name": "Python 3",
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								   "language": "python",
 								   "name": "python3"
 								  },
 								  "language_info": {
 								   "codemirror_mode": {
 								    "name": "ipython",
 								    "version": 3
 								   },
 								   "file_extension": ".py",
 								   "mimetype": "text/x-python",
 								   "name": "python",
 								   "nbconvert_exporter": "python",
 								   "pygments_lexer": "ipython3",
-												Update existing cookbooks to use gpt-4o (#1197)


											
										
										
											2024-05-13 13:28:44 -04:00
+								   "version": "3.11.8"
-												Add examples and guides for DevDay releases (#820)


											
										
										
											2023-11-06 12:48:12 -08:00
+								  }
 								 },
 								 "nbformat": 4,
 								 "nbformat_minor": 2
 								}