openai-cookbook/examples/GPT_with_vision_for_video_understanding.ipynb

290 lines
3.2 MiB
Plaintext
Raw Permalink Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Processing and narrating a video with GPT-4.1-mini's visual capabilities and GPT-4o TTS API\n",
"\n",
"This notebook demonstrates how to use GPT's visual capabilities with a video. Although GPT-4.1-mini doesn't take videos as input directly, we can use vision and the 1M token context window to describe the static frames of a whole video at once. We'll walk through two examples:\n",
"\n",
"1. Using GPT-4.1-mini to get a description of a video\n",
"2. Generating a voiceover for a video with GPT-4o TTS API\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display, Image, Audio\n",
"\n",
"import cv2 # We're using OpenCV to read video, to install !pip install opencv-python\n",
"import base64\n",
"import time\n",
"from openai import OpenAI\n",
"import os\n",
"\n",
"client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Using GPT's visual capabilities to get a description of a video\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2023-11-11 01:32:51 +08:00
"First, we use OpenCV to extract frames from a nature [video](https://www.youtube.com/watch?v=kQ_7GtE529M) containing bisons and wolves:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"618 frames read.\n"
]
}
],
"source": [
2023-11-06 13:05:53 -08:00
"video = cv2.VideoCapture(\"data/bison.mp4\")\n",
"\n",
"base64Frames = []\n",
"while video.isOpened():\n",
" success, frame = video.read()\n",
" if not success:\n",
" break\n",
" _, buffer = cv2.imencode(\".jpg\", frame)\n",
" base64Frames.append(base64.b64encode(buffer).decode(\"utf-8\"))\n",
"\n",
"video.release()\n",
"print(len(base64Frames), \"frames read.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display frames to make sure we've read them in correctly:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCALQBQADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD6PV3B6fpVhOlMWNSfpUgGBig8sKjdhuqSmOmWzg0AJRRRg+lABRUlFAEdIUB5paKAEI44FSJyuaTy/emK27tTsAXQ4Bqt5fvVqgDJxVKdlYCh9mPcn8qu2L+RwOvqaeIVIzTBGB3q4y5gJrqcMA3fvVGSUuWTt3GKmYE5GfzpPJ3DJH5VTcbAUpYFjPmBzzyRjpV/RrqIt5Mh78HFJ9njK7WX86EhhUjArF3lIDUnKCAiN87uOO1ZxhxyenpUiEKMKelKeetF3FWQECKA3Ap9OKCgoPepAbyDxT1JI5prAL60BiBgUAPJA60DkZFQOHPVuO+KltsBDuHanZWuA6il+T3pKQBRQBk4p3l+9ADcE9BS4PoaeBgYooAjwR1FFSfjShSeQKAIsE9BS4Poak2N6UbWHagCKipNn+z+lGz/AGf0oAjop/lH3/KgRlTn+lADKKkooAjopShFG0+lACUUuD6GkoAKKkooAjoqSo6ACiilClulAEVFSUUAFFFFABRRRQAUUUUAFFFFADfL96bUlJ5OPWugBlFBBBwaKACiiigBAmDnNLRRQAUoGTikooAkooornAjoqSiugCOgjIxTnAA4FNoAKKKKACmqpByRTqKAG+X70eX706igBrRg8/0qJ1G7pU9IUBoAgwPQUVPsHqaawwcUARU4JkZzTqKAG+X70hGDin0hTJzmgCCipNo/u/pRgegoAZg+howfQ1Jg+hpKAGYPoaMH0NPooAZg+hpKkpjfeoAig71JTIQQpyMc0+gAooooArgZpSMdxT+DwfSkwnt+dZgQUUUjSIvDNiqinIT3FoPQ1DK6k8N+RqCa4jRSd3NHKLmIrnCgZGSe9Vmxg/Wn3GoIn31H/fNUrnWrGB9jSAk8jbXTGLsTZEso68VVYMrbgelVpvEtnnCqT+NVptf358qI9eMnJq+VickizNErHcE6j60xrTzBsxt471VXU52/dkYHUDNPW+mkYM7Yx6VQueI+TS1KZD4HXp60fZIEXIPOemOlV7jVHjG1XUgH1qjJrtxGS7EYquVmTt0NVYFUZ4/KkmiiUZ46d6wbnxLcP80ZwPbis6+1q9kUgSH65o5RHS3ptUQmZlA9zWDqn2WZW2XIbngKetY9/d311JtMuefyotbabh3ds+maaVhPYr6lY28rAQAZ/iz3qlBoZeZYxBnngk10EWnyyOP3Z+boSKv2nh9HG3YcjqcVruZqMmc9a+G3WTK24+9hq29O8IST/eshwemOta+maBAh+dWODzXX6BokUYBCsfl5DHpWMqqiaxhcyvDfgiNCpkhUKT83y12FhpVpYIVtowu7GSB1xT4bdYhgDAqZWJGBXLWquT902hTS1EVhGcseB6CpYryIHcgGQepFRsing0ogC9BWCbtqapWJ0dZJckDr2qZVAHJxVRYyuTk8mpkmdBgkn8aUXYZNhf736UocJ0NRfaT6H86Qzhh8yn25quYCczn1pPPz3/Wq7zAdE/WkWXdn5egz1qtwFadpe2B6VG/QGlVdvehx8tRpcCFn+bbmoqldQGyBSDjtVk8yCnR96TB9vzpQQnfNA3sOpCwXrTTMg5NRTXtuvHmDI7ZqVFthcn3j0NNklUDFUrjVUjOE5+hrMu9bky6q3bgCtY07bhqzbku1iUbnAFVZtbjjOBlvxrm7jWZpQAvGO2c1WaeeXlzyfetFGK6CZvXPiw5IQBSD6c1nT+IrmRSpz+dUPIlc8gmp4tHmkUt6jOD1pgiGW8mlb5pGPPGT0qMySucsST9a0rPQJCmZQfw7VYtfD+6UhySM/lQDbMiOGR+ufpirFvpM9zIoRDiult9JtEPI/MVN9mhiOUUD8KBXZzx8MSnr/OpLbw3KJRk/rW+ABwKdCRvAIpPYNRmjaTDBxKgOSK0/7Ptf+eYqKIAZ20+oLNSmOTnFJRXGAq9R9afUdOD4GCKAG09TkVCr5OCwp9NppgSUVHT0+6KQC0UVG7kHvQBJSJ0/GmrIcf405WGOTT6ALUdSVHV0+oElFAPGaM1mA7y/emkYOKd5ntQJMjOP1ppN7ANoopV6ikAxVIOSKdRRTbbYBRRRSARlyOlMRF3DipKKpSsrAJtX0prqR0b9KfQQD1FJNJ6gN8v3p8QwcUlPTp+NIBaKVV3d6Ch7UAJQMZ5FLsb0pQh7mgAXaT92n+QxGdn60kcRYnBqdRhQD2FAEIgdedp/Okqx161E8RXpz9BQAyil2P8A3T+VGx/7p/KgBKR/umnbW/un8qRlOMMCKAI6kwPQUgUCloAjoqQgHqKCM8GgCOipMD0FB5GKAI6KKKACo6n2L6Uzyx/dNAEdKq7u9PoxQBHRQQQcGigAop3l+9BTAzmgBkYPXFOwfQ0Q/dp9VP4gK9FSUEZGK2AjooooAkoPTio6KAHRoWIz61LsUcmmo7RnKHFIxODzQASAEYWmbW9Kcv3aWgBmD6GjB9DT6KAI6KKKACiiigCZVBGTR5fvSQysOoHX0qyIhIo+XH0oArFCBxzSbSeq/pVr7MPf86TyF7E0AVCgzwadTvL96PL96AGBQO1I49BUuweppmCOooAZg+hpCCOoqSggHqKAI6KfsXOaNgoAZRTih7UmxqAEpCmTnNOII6ikoAQJg5zS0UUAMb71NZRgnFSFMnOaTy/egCGnhVI6U/yz2NNP1oAjooooAKinJAJBqWmyRhlOV60AVgW6AnmjY/8AdP5VOIF4IH60/YPU0AVdj/3T+VGx/wC6fyq1sHqaNg9TQBVYYPU1Fv8A9r9atvApOB+tRSW6qcH68UkmiGyk8zK54/GoJZJicqx6+lXpowRgCoDGFGSOavTlDSxTZpQOpH1NQvvPGf1q5OOgxwaqyllJHoeKFBtXCzZBKvOGFUdRthKmSBwelX5CSOT3qnc52nkitoKwjKmsowcmMZ9xURt4lJKxgGrs6ORnnPvVSZZghKAE9s1siZFF9RhiJBHIPSq82sSEkRqfbAzVyLw7JKwMr5zzipDoNvHnLnI709EZcrMWaS7uDkZT0NN+xu5xK7deg6VvNoqDhJOSe4qSHRoWcgAt7elMLMwEsFAxhjz+tB0aMDJQYJ44rq4NGQjBXgfrUq6Iu0KVOfcVLkkUoNnIW+ho042qCMc8Vp2nhncNxiB5/hroLbQY2cDZk9qvw6PtGFBGOnFJzSKVM5y10BY5Qwt24PetRdEicDbGcjqB1rUGlTnkAZzViz06aOQMSCe1Q6yGopGbZ6L5Z3CI8f
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display_handle = display(None, display_id=True)\n",
"for img in base64Frames:\n",
" display_handle.update(Image(data=base64.b64decode(img.encode(\"utf-8\"))))\n",
" time.sleep(0.025)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2023-11-11 01:32:51 +08:00
"Once we have the video frames, we craft our prompt and send a request to GPT (Note that we don't need to send every frame for GPT to understand what's going on):\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Witness the raw power and strategy of nature in this intense wildlife encounter captured in stunning detail. A determined pack of wolves surrounds a lone bison on a snowy plain, showcasing the relentless dynamics of predator and prey in the wild. As the wolves close in, the bison stands its ground amidst the swirling snow, illustrating a gripping battle for survival. This rare footage offers an up-close look at the resilience and instincts that govern life in the animal kingdom, making it a must-watch for nature enthusiasts and wildlife lovers alike. Experience the drama, tension, and beauty of this extraordinary moment frozen in time.\n"
]
}
],
"source": [
"response = client.responses.create(\n",
" model=\"gpt-4.1-mini\",\n",
" input=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"input_text\",\n",
" \"text\": (\n",
" \"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.\"\n",
" )\n",
" },\n",
" *[\n",
" {\n",
" \"type\": \"input_image\",\n",
" \"image_url\": f\"data:image/jpeg;base64,{frame}\"\n",
" }\n",
" for frame in base64Frames[0::25]\n",
" ]\n",
" ]\n",
" }\n",
" ],\n",
")\n",
"\n",
"print(response.output_text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Generating a voiceover for a video with GPT-4.1 and the GPT-4o TTS API\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a voiceover for this video in the style of David Attenborough. Using the same video frames we prompt GPT to give us a short script:\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In the frozen expanse of the winter landscape, a coordinated pack of wolves moves with calculated precision. Their target, a lone bison, is powerful but vulnerable when isolated. The wolves encircle their prey, their numbers overwhelming, displaying the brutal reality of survival in the wild. As the bison struggles to break free, reinforcements from the herd arrive just in time, charging into the pack. A dramatic clash unfolds, where strength meets strategy in the perpetual battle for life. Here, in the heart of natures harshest conditions, every moment is a testament to endurance and the delicate balance of predator and prey.\n"
]
}
],
"source": [
"result = client.responses.create(\n",
" model=\"gpt-4.1-mini\",\n",
" input=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"input_text\",\n",
" \"text\": (\n",
" \"These are frames of a video. Create a short voiceover script in the style of David Attenborough. Only include the narration.\"\n",
" )\n",
" },\n",
" *[\n",
" {\n",
" \"type\": \"input_image\",\n",
" \"image_url\": f\"data:image/jpeg;base64,{frame}\"\n",
" }\n",
" for frame in base64Frames[0::25]\n",
" ]\n",
" ]\n",
" }\n",
" ]\n",
")\n",
"\n",
"print(result.output_text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can work with the GPT-4o TTS model and provide it a set of instructions on how the voice should sound. You can play around with the voice models and instructers at [OpenAI.fm](openai.fm). We can then pass in the script we generated above with GPT-4.1-mini and generate audio of the voiceover:\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <audio controls=\"controls\" >\n",
" <source src=\"data:audio/wav;base64,UklGRv////9XQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0Yf/////z//L/8f/w//L/8P/0//H/8v/z//T/9f/y//P/9f/1//n/+f/7//v/+//8//3/+v/7//v//P/6//r/+f/5//n/9//1//P/8//x//L/8P/v/+v/6//n/+b/4v/e/9//3//f/+H/4P/g/9//3v/g/+X/5//p/+v/8P/w//H/7//t//L/9f/8//3//f////7/AAACAAMAAgAGAAoADAAMAAwACgALAAoABwAGAAQABQAHAAgABgAFAAMABAABAAYACAAMAA0AEAAXABgAFAAYABgAHwAjAC8AMwA6AD8APwA9AEUASQBSAFYAXQBeAF8AYgBcAGIAWwBYAFoAWgBbAFgAVABMAEMAPwA0ADAAKgAkACAAFAALAAAA9f/u/+T/3v/b/9P/0v/N/8f/uv+y/7D/rP+t/7j/t//E/8P/xf/J/8P/z//N/9n/4v/k//X/8f////j/9f/7//P/AgACAAcAEQAAAAMA9P/n/93/0v/W/9j/2v/W/8X/vv+t/6X/nf+V/57/n/+o/6//p/+t/6f/qv+w/7v/y//h//T/BQAPABoAHAAkACoANgBFAFUAYwBtAGkAaABbAFEASQBDAD4AOwA1ADIAIQASAP3/5//c/83/yP/F/8P/wv+0/6//o/+Y/5b/lP+d/67/xP/R/9j/1//O/9H/1f/o//b/BwARABAAEQD5////7f/p//D/1f/k/8D/uv+s/4f/lP9d/2j/Qf8n/zj/Av8x/w7/Hf8t//X+Ff/V/vj+AP8I/07/P/9y/2D/Yv90/2P/pf++//X/PQBBAH0AhgCZAMQAzwAEASUBWwGiAbQB9wHZAdUB0wGhAd8BtQH8AQcC5gH/AYoBmQE8ASEBPgHxAFYBBwESAfQAkQCsAEkAdABRAEUAZwAhAGcARQA9AEsA9f8SAAAAJQBqAHAAkwB3AIEAfAChANAAtwDyAK4AtwDcAMAA7wDQAMgAhgBgACQA5f/b/6D/1/+8/8P/ov81//3+nP7R/rr+Hv80/+n+Nv/I/gf/Ev8//2n/L/9//y7/gv9x/57/7P+h/yAAjf9l//z+eP6t/qL+Y/8p/wj/Xf4T/b/8F/y3/A79If1F/Sz8tvsE+wX7Uvuk+yr81/vn+yj7YvuZ+xz8+vwc/Yz90fzK/B78cfw2/W39xf62/rn+3v34/Hn8J/wW/Tz+cv84/6f+FP7s/E/9L/7w/sH/fv9IAOj/bwDvAPoA+wHdAccC+gIzA9oCJANSBWsHbgoHCi8JngYaBDsH4wq4D44SIRJ5DzUMCAulCy0NWg7ND0YQFRDVDm4M6ghuBq0FqwUzCJQHKgXVAk0ANQCkAFABSwBv/UD7Jfu0+yD9mv4J/qP9Rv3j/CL8XfuS+tf6UP0CAE4CxQLpAOT+m/1r/g0AWAE2AkQCuwHaAZICDQH5/0r+iPyw/J/87fzB/PL74vq7+u/5E/mf9wH2RPaI9vz3w/nw+Sn5bfhd+D34IPn4+YT6UvsV/Fj9dP6z/qH+uf5a/t/+of/1//kAiAHFAXICHgItAS0Anf8l/5L/7wBlAPIAhf/D/tT++f3c/p/+M/6v/VL+Nv5M/4sAFgCoAJ3//P7n/s7+kP79/3oApwB/Aff/DP9O/UL8u/zi/Dn9bv1c/Oj69/nW+GX45PdY9zb3rvZU9n71SvVd9LTzyfMy82jz5vIz89nz0PRK9Tb21fVk9QH28fZB+bj6Uvy4/Db8APw3/Vb+DP8AAecAygA2APj/zP/7/zEA7P9GAK3+u/0N/en8uvwd/UL9VP01/DD74fvZ+9r82f02/4j/bP+M/1wA1QGZAg8EwwWEBSsFpAWVBvUH1QjMCjsMEQ2YC9gLZAvZChsNsA9iFHMUZRTYEycTaBMEFT0Y7RcMF4gWtxd4F0oZsBi0FrMUVRE7EjMQhQ+1DvcMmQwjDKwLAAlSBYwC5gG/ALgC/wIoAWb/XP3x/MT6gPpn+ZL4jvms+rD7MPs4+of5a/mC+1r9R/1H/sP9eP6MAUMDcgQxBCkDqAIJAhgDQQPKAZgBiQETAkQCQwG8/gz8evqh+kf8uf2P/OD6+fk6+H35X/my+ZD6mvix+T/6g/mr+gD7MvtH/Y/9uPxd/LT7n/uc/UYA4AG/Ad3/cv8V/s396f9hADEAQAAy/yz+SP1G+7H6P/qb+Wz6EPuZ+mX55vhI94D2R/bV9b72vPbL98v4tflJ+TP4hPcI9jX1hPVj9mb2LfYy9qT13/N182HywPAB8DPvTO+/7l3vXu/K7xrwA/Cc8BDwd/Ag8RLyZPOS9Tn3VPck+cD5N/kO+238zvzX/kn/Wv/RAFcAQAF8ApoCbQLcATUC1ABBAG4BFgEXAXoBlAFoAYkA5QAmAVsB6gE8AoADYgTcBKYGJgd1BzUHTgbMCG0JUAodDk0O/Q3UDm4Oiw6kD6QPHRH0EjkTlxWHF3kXrRdzG8scFByLHIobehgzF+IcqR+uH4kgcBzUFbkSxxA1DloPLw5WC2YM3gseCOgErAFP/Sn7j/uh+3T6H/i898L3HPf0+bf40fQE9GPyJfOr9br3Z/mR+uH7Kv2F/m/+QP1F/eT+UAFABBcG4AYIBp4EtwSbBKED6wLeAc4AawFVAfsAxf/Y/f38Pfuy+hj7ePkO+F748fi0+Xf6Y/u++h75WvlL+a/5mfpU+2X8FPxo/QX+0/yd/XD97fzy/W/+Df4J/qr9JPw8/Kr8s/ob+lX61fhU+Hz4+fdT9iP09vIR8jjx5PAx8U7wU+9n75DuU+6H7lnt5ezj7Q7tOO0v7qbtIu6w7mzv0u5r7mTutu1h7grwKvIG8pvyZPOz8ujyEfSI9fH10PbC9xv57/lT+aT6B/yJ+0P8TP5Z/uj9MP5K//P/oP9mAe4BkgGhAQEBGQG5AJAABQFfAuIC6wI7BEQEaQQtBY8F+AWABiQHcghVCZAKrAyEDSUOJg+0Dh8O/w10DcQPZRFzEGMT1BMbEc8ROBIfE0oTBxRsFfIX8xgNF9IcSR2lGfId+h2BG3Qa1hguGQsYsRg6GToYlRd8Eh8QKQ4MC0wJRAZLBgIF3wHKAvwBv/+m/tz82fvg+Sj4evZ49cf2+/Uo9zH5Gvg+9zP3R/cb97r3Ufm++TD7KP0B/g8A1wDBAO8BGwLrAVECKALYAfgBvgJPAwUEggMLAlwB1v9W/j7+C/4E/ar8i/zf/BH9Df0e/Yf8HfwR+5v67/pZ+qf6evvU+078WPzP+xb7vvpC+hr6+Prw+jf65/rV+v36rfuO+2X7L/ra+O33C/f29vD23/Zn9/j2KfZQ9f/z//K68W7xRPEQ8QLxWvCq8KrwN/CB8CjwJu9J7pftS+1+7evtqO4T71jvIO+47qjuju7h7mjvYfBS8YLxG/Kh8uHydfNx9ET1sfVo9iX33ffj+Bj6Dvsc/Jv8Ev3B/av9Sv4V/5j/YwAkAfUBSgJeAgMDLwNrA0wE0QRYBRAGxAZfBzQIeAkeCssKjgvBCyAM1wsTDJoM7QuZDFUNOg2+DfgNXw1sDY0NGw2WDUUOsg6CD/UQUhLFEwkVcRVmFkEW4BV1FicWLBZQFmcWyxbUFo0W8hVKFQAUnRK/EYUQTQ9RDm0N0QxtDNwLEgstChQJqAc5BgkFmAPxAckAzP+//iD+l/3y/HP8/fuX+xT7hPoe+p35Svlm+cT5B/pg+uL6D/td+7b7w/vx++H7rfvC+9T77fsq/If85Pxj/fb9Xf6y/tz+5v4M/yj/W/+h/6//8P83AFQAeACLAGwAHwDk/57/Qv8L/9X+o/51/lP+I/7d/cj9lf1a/Un9IP3U/H38DvyQ+x77pvpE+vL5bPny+Hv40vc697j2Kfaz9W/1+vSK9Gb04vOM81jz/vLk8qDyd/Iu8v3xufE28Q7xu/BP8C3wFPAR8Nbv2e/c75rvh+9D7wXv+O7J7uPuPO+P7/LvjvAQ8V/x8PFG8ovyM/PP8330VfVe9jb3Evj++K3
" Your browser does not support the audio element.\n",
" </audio>\n",
" "
],
"text/plain": [
"<IPython.lib.display.Audio object>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"instructions = \"\"\"\n",
"Voice Affect: Calm, measured, and warmly engaging; convey awe and quiet reverence for the natural world.\n",
"\n",
"Tone: Inquisitive and insightful, with a gentle sense of wonder and deep respect for the subject matter.\n",
"\n",
"Pacing: Even and steady, with slight lifts in rhythm when introducing a new species or unexpected behavior; natural pauses to allow the viewer to absorb visuals.\n",
"\n",
"Emotion: Subtly emotive—imbued with curiosity, empathy, and admiration without becoming sentimental or overly dramatic.\n",
"\n",
"Emphasis: Highlight scientific and descriptive language (“delicate wings shimmer in the sunlight,” “a symphony of unseen life,” “ancient rituals played out beneath the canopy”) to enrich imagery and understanding.\n",
"\n",
"Pronunciation: Clear and articulate, with precise enunciation and slightly rounded vowels to ensure accessibility and authority.\n",
"\n",
"Pauses: Insert thoughtful pauses before introducing key facts or transitions (“And then... with a sudden rustle...”), allowing space for anticipation and reflection.\n",
"\"\"\"\n",
"\n",
"audio_response = response = client.audio.speech.create(\n",
" model=\"gpt-4o-mini-tts\",\n",
" voice=\"echo\",\n",
" instructions=instructions,\n",
" input=result.output_text,\n",
" response_format=\"wav\"\n",
")\n",
"\n",
"audio_bytes = audio_response.content\n",
"Audio(data=audio_bytes)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}