diff --git a/authors.yaml b/authors.yaml
index 54c948a..655d849 100644
--- a/authors.yaml
+++ b/authors.yaml
@@ -93,3 +93,7 @@ royziv11:
website: "https://www.linkedin.com/in/roy-ziv-a46001149/"
avatar: "https://media.licdn.com/dms/image/D5603AQHkaEOOGZWtbA/profile-displayphoto-shrink_400_400/0/1699500606122?e=1716422400&v=beta&t=wKEIx-vTEqm9wnqoC7-xr1WqJjghvcjjlMt034hXY_4"
+justonf:
+ name: "Juston Forte"
+ website: "www.linkedin.com/in/justonforte"
+ avatar: "https://avatars.githubusercontent.com/u/96567547?s=400&u=08b9757200906ab12e3989b561cff6c4b95a12cb&v=4"
diff --git a/examples/GPT_with_vision_for_video_understanding.ipynb b/examples/GPT_with_vision_for_video_understanding.ipynb
index b0b7cb9..6f1fe38 100644
--- a/examples/GPT_with_vision_for_video_understanding.ipynb
+++ b/examples/GPT_with_vision_for_video_understanding.ipynb
@@ -4,12 +4,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Processing and narrating a video with GPT's visual capabilities and the TTS API\n",
+ "# Processing and narrating a video with GPT-4o's visual capabilities and the TTS API\n",
"\n",
- "This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4 doesn't take videos as input directly, but we can use vision and the new 128K context window to describe the static frames of a whole video at once. We'll walk through two examples:\n",
+ "This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4o doesn't take videos as input directly, but we can use vision and the 128K context window to describe the static frames of a whole video at once. We'll walk through two examples:\n",
"\n",
- "1. Using GPT-4 to get a description of a video\n",
- "2. Generating a voiceover for a video with GPT-4 and the TTS API\n"
+ "1. Using GPT-4o to get a description of a video\n",
+ "2. Generating a voiceover for a video with GPT-o and the TTS API\n"
]
},
{
@@ -118,9 +118,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "\"🐺 Survival of the Fittest: An Epic Tale in the Snow ❄️ - Witness the intense drama of nature as a pack of wolves face off against mighty bison in a harsh winter landscape. This raw footage captures the essence of the wild where every creature fights for survival. With each frame, experience the tension, the strategy, and the sheer force exerted in this life-or-death struggle. See nature's true colors in this gripping encounter on the snowy plains. 🦬\"\n",
+ "Title: \"Epic Wildlife Showdown: Wolves vs. Bison in the Snow\"\n",
"\n",
- "Remember to respect wildlife and nature. This video may contain scenes that some viewers might find intense or distressing, but they depict natural animal behaviors important for ecological studies and understanding the reality of life in the wilderness.\n"
+ "Description: \n",
+ "Witness the raw power and strategy of nature in this intense and breathtaking video! A pack of wolves face off against a herd of bison in a dramatic battle for survival set against a stunning snowy backdrop. See how the wolves employ their cunning tactics while the bison demonstrate their strength and solidarity. This rare and unforgettable footage captures the essence of the wild like never before. Who will prevail in this ultimate test of endurance and skill? Watch to find out and experience the thrill of the wilderness! 🌨️🦊🐂 #Wildlife #NatureDocumentary #AnimalKingdom #SurvivalOfTheFittest #NatureLovers\n"
]
}
],
@@ -135,7 +136,7 @@
" },\n",
"]\n",
"params = {\n",
- " \"model\": \"gpt-4-vision-preview\",\n",
+ " \"model\": \"gpt-4o\",\n",
" \"messages\": PROMPT_MESSAGES,\n",
" \"max_tokens\": 200,\n",
"}\n",
@@ -167,15 +168,21 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "In the vast, white expanse of the northern wilderness, a drama as old as time unfolds. Here, amidst the silence of the snow, the wolf pack circles, their breaths visible as they cautiously approach their formidable quarry, the bison. These wolves are practiced hunters, moving with strategic precision, yet the bison, a titan of strength, stands resolute, a force to be reckoned with.\n",
+ "In the frozen expanses of the North American wilderness, a battle unfolds—a testament to the harsh realities of survival.\n",
"\n",
- "As tension crackles in the frozen air, the wolves close in, their eyes locked on their target. The bison, wary of every movement, prepares to defend its life. It's a perilous dance between predator and prey, where each step could be the difference between life and death.\n",
+ "The pack of wolves, relentless and coordinated, closes in on the mighty bison. Exhausted and surrounded, the bison relies on its immense strength and bulk to fend off the predators.\n",
"\n",
- "In an instant, the quiet of the icy landscape is shattered. The bison charges, a desperate bid for survival as the pack swarms. The wolves are relentless, each one aware that their success depends on the strength of the collective. The bison, though powerful, is outnumbered, its massive form stirring up clouds of snow as it struggles.\n",
+ "But the wolves are cunning strategists. They work together, each member playing a role in the hunt, nipping at the bison's legs, forcing it into a corner.\n",
"\n",
- "It's an epic battle, a testament to the harsh realities of nature. In these moments, there is no room for error, for either side. The wolves, agile and tenacious, work in unison, their bites a chorus aiming to bring down the great beast. The bison, its every heaving breath a testament to its will to survive, fights fiercely, but the odds are not in its favor.\n",
+ "The alpha female leads the charge, her pack following her cues. They encircle their prey, tightening the noose with every passing second.\n",
"\n",
- "With the setting sun casting long shadows over the snow, the outcome is inevitable. Nature, in all its raw beauty and brutality, does not show favor. The wolves, now victors, gather around their prize, their survival in this harsh climate secured for a moment longer. It's a poignant reminder of the circle of life that rules this pristine wilderness, a reminder that every creature plays its part in the enduring saga of the natural world.\n"
+ "The bison makes a desperate attempt to escape, but the wolves latch onto their target, wearing it down through sheer persistence and teamwork.\n",
+ "\n",
+ "In these moments, nature's brutal elegance is laid bare—a primal dance where only the strongest and the most cunning can thrive.\n",
+ "\n",
+ "The bison, now overpowered and exhausted, faces its inevitable fate. The wolves have triumphed, securing a meal that will sustain their pack for days to come.\n",
+ "\n",
+ "And so, the cycle of life continues, as it has for millennia, in this unforgiving land where the struggle for survival is an unending battle.\n"
]
}
],
@@ -190,7 +197,7 @@
" },\n",
"]\n",
"params = {\n",
- " \"model\": \"gpt-4-vision-preview\",\n",
+ " \"model\": \"gpt-4o\",\n",
" \"messages\": PROMPT_MESSAGES,\n",
" \"max_tokens\": 500,\n",
"}\n",
@@ -216,7 +223,7 @@
"text/html": [
"\n",
" \n",
" "
@@ -266,7 +273,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.9.16"
+ "version": "3.11.8"
}
},
"nbformat": 4,
diff --git a/examples/How_to_combine_GPT4v_with_RAG_Outfit_Assistant.ipynb b/examples/How_to_combine_GPT4o_with_RAG_Outfit_Assistant.ipynb
similarity index 55%
rename from examples/How_to_combine_GPT4v_with_RAG_Outfit_Assistant.ipynb
rename to examples/How_to_combine_GPT4o_with_RAG_Outfit_Assistant.ipynb
index 80659a2..38d3561 100644
--- a/examples/How_to_combine_GPT4v_with_RAG_Outfit_Assistant.ipynb
+++ b/examples/How_to_combine_GPT4o_with_RAG_Outfit_Assistant.ipynb
@@ -4,21 +4,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# How to combine GPT-4V with RAG - Create a Clothing Matchmaker App\n",
+ "# How to combine GPT-4o with RAG - Create a Clothing Matchmaker App\n",
"\n",
- "Welcome to the Clothing Matchmaker App Jupyter Notebook! This project demonstrates the power of the GPT-4V model in analyzing images of clothing items and extracting key features such as color, style, and type. The core of our app relies on this advanced image analysis model developed by OpenAI, which enables us to accurately identify the characteristics of the input clothing item.\n",
+ "Welcome to the Clothing Matchmaker App Jupyter Notebook! This project demonstrates the power of the GPT-4o model in analyzing images of clothing items and extracting key features such as color, style, and type. The core of our app relies on this advanced image analysis model developed by OpenAI, which enables us to accurately identify the characteristics of the input clothing item.\n",
"\n",
- "GPT-4V is a model that combines natural language processing with image recognition, allowing it to understand and generate responses based on both text and visual inputs.\n",
+ "GPT-4o is a model that combines natural language processing with image recognition, allowing it to understand and generate responses based on both text and visual inputs.\n",
"\n",
- "Building on the capabilities of the GPT-4V model, we employ a custom matching algorithm and the RAG technique to search our knowledge base for items that complement the identified features. This algorithm takes into account factors like color compatibility and style coherence to provide users with suitable recommendations. Through this notebook, we aim to showcase the practical application of these technologies in creating a clothing recommendation system.\n",
+ "Building on the capabilities of the GPT-4o model, we employ a custom matching algorithm and the RAG technique to search our knowledge base for items that complement the identified features. This algorithm takes into account factors like color compatibility and style coherence to provide users with suitable recommendations. Through this notebook, we aim to showcase the practical application of these technologies in creating a clothing recommendation system.\n",
"\n",
- "Using the combination of GPT-4 Vision + RAG (Retrieval-Augmented Generation) offers several advantages:\n",
+ "Using the combination of GPT-4o + RAG (Retrieval-Augmented Generation) offers several advantages:\n",
"\n",
- "1. **Contextual Understanding**: GPT-4 Vision can analyze input images and understand the context, such as the objects, scenes, and activities depicted. This allows for more accurate and relevant suggestions or information across various domains, whether it's interior design, cooking, or education.\n",
+ "1. **Contextual Understanding**: GPT-4o can analyze input images and understand the context, such as the objects, scenes, and activities depicted. This allows for more accurate and relevant suggestions or information across various domains, whether it's interior design, cooking, or education.\n",
"2. **Rich Knowledge Base**: RAG combines the generative capabilities of GPT-4 with a retrieval component that accesses a large corpus of information across different fields. This means the system can provide suggestions or insights based on a wide range of knowledge, from historical facts to scientific concepts.\n",
"3. **Customization**: The approach allows for easy customization to cater to specific user needs or preferences in various applications. Whether it's tailoring suggestions to a user's taste in art or providing educational content based on a student's learning level, the system can be adapted to deliver personalized experiences.\n",
"\n",
- "Overall, the GPT-4 Vision + RAG approach offers a powerful and flexible solution for various fashion-related applications, leveraging the strengths of both generative and retrieval-based AI techniques."
+ "Overall, the GPT-4o + RAG approach offers a powerful and flexible solution for various fashion-related applications, leveraging the strengths of both generative and retrieval-based AI techniques."
]
},
{
@@ -37,25 +37,9 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Note: you may need to restart the kernel to use updated packages.\n",
- "\u001b[31mERROR: Could not find a version that satisfies the requirement concurrent (from versions: none)\u001b[0m\u001b[31m\n",
- "\u001b[0m\u001b[31mERROR: No matching distribution found for concurrent\u001b[0m\u001b[31m\n",
- "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"%pip install openai --quiet\n",
"%pip install tenacity --quiet\n",
@@ -86,7 +70,7 @@
"\n",
"client = OpenAI()\n",
"\n",
- "GPT_MODEL = \"gpt-4-vision-preview\"\n",
+ "GPT_MODEL = \"gpt-4o\"\n",
"EMBEDDING_MODEL = \"text-embedding-3-large\"\n",
"EMBEDDING_COST_PER_1K_TOKENS = 0.00013"
]
@@ -241,7 +225,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "1024it [00:02, 438.09it/s] \n"
+ "1024it [00:01, 724.12it/s] \n"
]
},
{
@@ -285,11 +269,11 @@
"4 2012.0 Ethnic Shree Women Multi Colored Patiala \n",
"\n",
" embeddings \n",
- "0 [0.006932149175554514, 0.00035877348273061216,... \n",
- "1 [-0.04372308403253555, -0.008888939395546913, ... \n",
- "2 [-0.02798476628959179, 0.058831773698329926, -... \n",
- "3 [-0.004207939375191927, 0.02941293641924858, -... \n",
- "4 [-0.05236775428056717, 0.015963038429617882, -... \n",
+ "0 [0.006903026718646288, 0.0004031236458104104, ... \n",
+ "1 [-0.04371623694896698, -0.008869604207575321, ... \n",
+ "2 [-0.027989011257886887, 0.05884069576859474, -... \n",
+ "3 [-0.004197604488581419, 0.029409468173980713, ... \n",
+ "4 [-0.05241541564464569, 0.015912825241684914, -... \n",
"Opened dataset successfully. Dataset has 1000 items of clothing along with their embeddings.\n"
]
}
@@ -387,7 +371,7 @@
"source": [
"### Analysis Module\n",
"\n",
- "In this module, we leverage `gpt-4-vision-preview` to analyze input images and extract important features like detailed descriptions, styles, and types. The analysis is performed through a straightforward API call, where we provide the URL of the image for analysis and request the model to identify relevant features.\n",
+ "In this module, we leverage `gpt-4o` to analyze input images and extract important features like detailed descriptions, styles, and types. The analysis is performed through a straightforward API call, where we provide the URL of the image for analysis and request the model to identify relevant features.\n",
"\n",
"To ensure the model returns accurate results, we use specific techniques in our prompt:\n",
"\n",
@@ -402,7 +386,7 @@
"3. **One Shot Example**: \n",
" - To further clarify the expected output, we provide the model with an example input description and a corresponding example output. Although this may increase the number of tokens used (and thus the cost of the call), it helps to guide the model and results in better overall performance.\n",
"\n",
- "By following this structured approach, we aim to obtain precise and useful information from the `gpt-4-vision-preview` model for further analysis and integration into our database."
+ "By following this structured approach, we aim to obtain precise and useful information from the `gpt-4o` model for further analysis and integration into our database."
]
},
{
@@ -518,7 +502,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "{'items': [\"Men's Light Blue Jeans\", \"Men's Black Leather Belt\", \"Men's Brown Loafers\"], 'category': 'Shirts', 'gender': 'Men'}\n"
+ "{'items': [\"Slim Fit Blue Men's Jeans\", \"White Men's Sneakers\", \"Men's Silver Watch\"], 'category': 'Shirts', 'gender': 'Men'}\n"
]
}
],
@@ -562,13 +546,13 @@
"output_type": "stream",
"text": [
"513 Remaining Items\n",
- "[\"Men's Light Blue Jeans\", \"Men's Black Leather Belt\", \"Men's Brown Loafers\"]\n"
+ "[\"Slim Fit Blue Men's Jeans\", \"White Men's Sneakers\", \"Men's Silver Watch\"]\n"
]
},
{
"data": {
"text/html": [
- ""
+ ""
],
"text/plain": [
""
@@ -616,7 +600,7 @@
"source": [
"### Guardrails\n",
"\n",
- "In the context of using Large Language Models (LLMs) like GPT-4V, \"guardrails\" refer to mechanisms or checks put in place to ensure that the model's output remains within desired parameters or boundaries. These guardrails are crucial for maintaining the quality and relevance of the model's responses, especially when dealing with complex or nuanced tasks.\n",
+ "In the context of using Large Language Models (LLMs) like GPT-4o, \"guardrails\" refer to mechanisms or checks put in place to ensure that the model's output remains within desired parameters or boundaries. These guardrails are crucial for maintaining the quality and relevance of the model's responses, especially when dealing with complex or nuanced tasks.\n",
"\n",
"Guardrails are useful for several reasons:\n",
"\n",
@@ -625,7 +609,7 @@
"3. **Safety**: They prevent the model from generating harmful, offensive, or inappropriate content.\n",
"4. **Contextual Relevance**: They ensure that the model's output is contextually relevant to the specific task or domain it is being used for.\n",
"\n",
- "In our case, we are using GPT-4V to analyze fashion images and suggest items that would complement an original outfit. To implement guardrails, we can **refine results**: After obtaining initial suggestions from GPT-4V, we can send the original image and the suggested items back to the model. We can then ask GPT-4V to evaluate whether each suggested item would indeed be a good fit for the original outfit.\n",
+ "In our case, we are using GPT-4o to analyze fashion images and suggest items that would complement an original outfit. To implement guardrails, we can **refine results**: After obtaining initial suggestions from GPT-4o, we can send the original image and the suggested items back to the model. We can then ask GPT-4o to evaluate whether each suggested item would indeed be a good fit for the original outfit.\n",
"\n",
"This gives the model the ability to self-correct and adjust its own output based on feedback or additional information. By implementing these guardrails and enabling self-correction, we can enhance the reliability and usefulness of the model's output in the context of fashion analysis and recommendation.\n",
"\n",
@@ -691,6 +675,60 @@
"execution_count": 15,
"metadata": {},
"outputs": [
+ {
+ "data": {
+ "image/jpeg": "/9j/4AAQSkZJRgABAQAAZABkAAD/7AARRHVja3kAAQAEAAAAVwAA/9sAQwABAQEBAQEBAQEBAQEBAgIDAgICAgIEAwMCAwUEBQUFBAQEBQYHBgUFBwYEBAYJBgcICAgICAUGCQoJCAoHCAgI/9sAQwEBAQECAgIEAgIECAUEBQgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgI/8AAEQgAUAA8AwERAAIRAQMRAf/EAB0AAAICAgMBAAAAAAAAAAAAAAAKCQsHCAECBgP/xAA2EAABAwMDAwIEAwYHAAAAAAABAgMEBQYRAAcICRIhEzEKFSJBFHGRJDJCYYGxFyMzQ1FScv/EAB0BAQACAgMBAQAAAAAAAAAAAAAGBwEFAgQIAwn/xAAxEQABBAECAwYGAgIDAAAAAAABAAIDEQQFIRIxUQYHQYGRsRMiYXGhwQgUIyQWMlL/2gAMAwEAAhEDEQA/AH+NEUDnXipzn+GHEq4HVLTTIm6iY72M+FPUaoBB/Vs/rqv+8SMnFjI8Hj2KtruekA1CVvVh/BCRB5R1e2atyl3BuGBVWZYgMwqR6ocCksyktFbrac4HqJLxRj7FKz5PjXu7+OMWLgaBFPlSNY51kWQOZq9yOY5eq8Z/yKycjO7RTQ47C4NppoE8huNuh5rwFJu2k05hysJqNLd9BsPNd6XXI8Fgq9MuhSR/muFWUkg+PIH8Steg/wDn+iwSNZLks4nf9QDYHUkgEA/UkDwCoVvYvVZmOcyB3COfIE9AATfkAeq9pW6YalGnstKh1uozGAFzvxLaOxopyOxnOUpwolKfuTkk5zqaRZUMzeKF7Xk9CD7FRkwyRO4ZmlgHgQfcqfTe66l3jI4Ib5Op/DRa5xSpUBJHhv5pRpEqJLT/AOkqdZ8fbJGvyI7/ALTn4eqZMJ5h7/ybH4K/Vb+Oee3KwInt/wDLfav0nVuPNHk27sHslb0xHpS4FoUaG6nGO1bcFlJGPzB199Pj4IGMPgAPwoXq0wkypZB4ucfUlZi121r0aIogutZbya/xOs1S0Mtph7kW7I/FO/uQe9T7HqkDyR+0duB5+vUO7dNvAvo5vurJ7qX1q4HVrvZV4HJe/b23b5V7/THqQ1tbQoFzLgQoy6ehD6WYMRinxluBSR2uuMxkPFSckl4KBJOdXL2L0+OTCjku21+qVM9tsqSLUZo3in8R97/K1QujbStW6qsxo0yPKAbYQlBWUh9K1qIHZ7YygZz7Y99TOeAF23SvUgKI48+2/X2BWXKTUd7LHpaK9Ho71bdXBC5TcxDL6JGEntWoEgo9wAEnGE47Rga7DIHQgCLw8j6rrvlZM7/Ieaco5EbA23E4c9GSnWbaLdt0+XZMxupxUvLeUwusw6bNmLK1kq7VSnXVYBwFOpSkAYGvN/ehky5UofkO4nF4bZ3NCgPQBel+5mOPFbK2AU0MJFddz7lNwRmGY0dmNHbSyw2kNoSPZKR4A/QDUgCrwm9yvvrKwjRFFd1cdv8AdC+ON9BqNhUqBX7fty5GbjuOCpLinnIjUWQhl5CUA97TEl6PIeTgH0mlqScowYt2txJJcYBotoIJHjX0+3NWJ3Y5+PBqdzmi5pDSeXEev35earutxK3el312l7hX1a0iy9y6tbkRm9KLKYejGl1mMoxiUpWSlxt9ppl5sgrSQv3+2rc7s8QxYBiBDmBx4SDdtO+/1FkKte+DIbLqYncwxyuaPiMII4XCxtfNpABG61/qG3e4Ui1a9umuKqn7cxLmh2y5VT2BHzVyHImJhpABP+hGfdUQAkEpGcqTqXZE1ZIjLvC/QqvIY/8AX4+Hxr1C2WsO1a/ddruRIVPYuWQqE4QGUfW0222ta1nv90JQhZKgBga2TJgRuV0Hso7Jqen37uJuXYnTuvq3qLXa/bVG2926tyNTm2lLCayqCzPcaitAEOqebRDQtQ9lBlJ8Dx5b7cB7tS+TdrHjzcTe3kF6/wC7PGhj0V5lNPkY519GNAZZ6fMTXWim10+x/M/31MVTC7aIjRFjHep9+Ls/upKjKUmS3bdUcbI9woRHSCP66LBVaDvFQLSpWx+zdTtj06lNbtOmQnGETH3F091tvLrbjrxUpslSS6ewhISoY7R4FodlYGRMc1jeGzZ5b+ignaPUcjJc347y/hFCyTQ6brf3n5wtqHGr4drireE2C5G3HrW8FF3JuhTYCFRzWYcqJHTkecNRl09rtVnyV/dRzoZdUL895HKi0eq28WBw4jQdvE+i0P26iS6Hs1c0+nSVQgaLIjOqjLKPUaUyS4jI/hUMgj2OpTjycRAKjMrKuk9t09uDWxW2Gz3Gvely2KvXd6F7d2wh2q1ipvy1U9SKOwyG4rK1ekwhCFLbQlKR2JJAxk5qLOwYjlvmIt1mr+6t+DtBlf0GYLTwx0LAAF/c8zvvzq1KT7e2uS1aNERoiwpyUqU+jceN96xSzDFTiWZXJMYyFhDQdRAeUn1FHwlOQMk+AM6BCq1bd6c3S+P9uJEJyXJfiMBtgJJMnLKMI7R5KjntwM+VDwTq0dCfbL+irvUmf5PNOkdcK1bevvokclJVftVNH+VWZRLpgwHPoVSpkOXCktNjOO0oKC2f5FQ1XWO7hnB+qnc4uI/ZJ5QKO1RtkLmg02oKrMJ5C4bMvsUgPeuQ02Ag+Uk+oPGrHxXbgqBSiyrLm06DHta17dtqGhLcSnwI8FpI9koaaSgAf0SNVc91kkqxWihS9Briso0RGiLB/JfY23+TPH3ejj3ddQqNJty9LZqNszZMRZS6w1KYU0VpwRnHeCUnwoZSfBOstNG1gi9lXGi7artfuFxerFzGsRbisHdukivwG1qjLUINVaZf7CPqbLbrDhOPASk/cHU20d5kgew9CFE89gZO1w6hOydelMV7pF82oExJLM23IlNACsHvkVWGyjB/57nBjUTwG3M0Ve6kuW6onHlskidvGp8ul7C7PNVxFWqE7cO2KdMWtoIW/DXU4yErU2fYZX9ZTnwn7d2p9i/Kx9+APsoZJ80jSPEhWeKPY/mf76rZTtd9ERoiNEXBGQRkj8tEVXh8QhDOzHVs5JWNt3Vq1QKDW26LeT0UvqbYaqs+Ih6Spsj/AG3HkqdJ/wC7jnvjU87MkGPdRPW2/Psp9urvzYpHJP4bSzN4rVv/AOd3VcdXsy2q06/EMR+VcNPntuVJhUcKV2lLtNfeIyUlpHeCUka0MGI5mY5oFcNkfpbaTID8YEm7oJerowVWuct+onwlsKuusQ4sO8E3JPLzo7ZCKUw7PMftOMlSoiAMHJyfBHkTTUsz/RfJW5FefJRnBwwMtrfP9q0uT+6M4z98aq1Txc6IjREaIjRElT8Xzxc2ulWBxl5Y0SKzC38cuBdgSGI7Cu+46OYkmYn1FD6QqK4yspUrBKZS0+cJA3miak2B5EhppWr1LBdK22CyEp7EqW51tdKShwqze9cl7eXryUq5p9tuPlUdhdFtlliXJbSfA9ZdejtKKcA/hU93kDWz0WX42bJM3dte66Opx/DxmRnYqQ/4ZO31zerhsU/S2Xw1TaZdNSnNOp7jER8nfY7v5dyn2hn+etzrzWtwngbbjbzWv0suOS0noVaEj2Gq3UyRoiNERoiNESafxk21O7u5XGPiI/t1Zt73dbtPvSsfORSKe/LRFddpqRHXIDKVFCT2SEgqHbk4zkgHnE8NdZXINJBUB3JHh3eOzfQy6SV4chTN2futV638qLBqdMfS81S6uTUYq5jYHel1fyptxOUlZblNpP7uBttE1X+u4/LYP6Ws1TT/AIwG9EKbX4SPi3t6JvKXljIu6xr+vaC5CsGirpr6y/S4zraZst51haUlsP8A7E22o/ViM+CE5Oe5rusf2GNYAR4m/wALr6XpvwXFx36J2jUaW5RoiNERoiNEXBAPvn9dEURPW06dl8dTbhHUOO+2d1WfaW4ES6aTdNKeryXPwMlcZTiHGHXGkqca72ZDuFoSSFJSD4J0DqNrLa8VrD0Huj1u30sqLvpUd4d1LSvG4byRSYzdJt1clyDT24apKvWW8+lCnXlmWU+EAJSgDzk6+0s3H4LiGgckwrr4rKNERoi//9k=",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The items match!\n",
+ "The black shirt and blue jeans create a classic and casual outfit that works well together.\n"
+ ]
+ },
+ {
+ "data": {
+ "image/jpeg": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The items match!\n",
+ "The black shirt and the white sneakers with red and beige accents can work well together. Black is a versatile color that pairs well with many shoe options, and the white sneakers can add a sporty and casual touch to the outfit.\n"
+ ]
+ },
+ {
+ "data": {
+ "image/jpeg": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The items match!\n",
+ "The black button-up shirt is casual and versatile, making it compatible with the white and red athletic shoes for a relaxed and sporty look.\n"
+ ]
+ },
{
"data": {
"image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/7AARRHVja3kAAQAEAAAAWAAA/+EAGEV4aWYAAElJKgAIAAAAAAAAAAAAAAD/4QNvaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/PiA8eDp4bXBtZXRhIHhtbG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJBZG9iZSBYTVAgQ29yZSA1LjAtYzA2MSA2NC4xNDA5NDksIDIwMTAvMTIvMDctMTA6NTc6MDEgICAgICAgICI+IDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+IDxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHhtbG5zOnhtcE1NPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvbW0vIiB4bWxuczpzdFJlZj0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL3NUeXBlL1Jlc291cmNlUmVmIyIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bXBNTTpPcmlnaW5hbERvY3VtZW50SUQ9InhtcC5kaWQ6OEEwOUUwRDBBMDdDRTExMTkwNzJCMkFERDI5ODUyODYiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6NTFGOThGNUFCM0FDMTFFMTlFOUVBRDhEN0Q5NUZFODAiIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6NTFGOThGNTlCM0FDMTFFMTlFOUVBRDhEN0Q5NUZFODAiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNS4xIFdpbmRvd3MiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo4QTA5RTBEMEEwN0NFMTExOTA3MkIyQUREMjk4NTI4NiIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo4QTA5RTBEMEEwN0NFMTExOTA3MkIyQUREMjk4NTI4NiIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/Pv/bAEMAAQEBAQEBAQEBAQEBAQICAwICAgICBAMDAgMFBAUFBQQEBAUGBwYFBQcGBAQGCQYHCAgICAgFBgkKCQgKBwgICP/bAEMBAQEBAgICBAICBAgFBAUICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICP/AABEIAFAAPAMBEQACEQEDEQH/xAAeAAACAgMBAAMAAAAAAAAAAAAACggJAQYHBAIDBf/EADIQAAEDAwMCBQIEBwEAAAAAAAECAwQFBhEABxIIIQkTIjFBFGEVUXGBChYjMjNCoaL/xAAcAQEAAQUBAQAAAAAAAAAAAAAABwIDBAUGCAH/xAA2EQABAwIDBgMFBwUAAAAAAAABAAIDBBEFEiEGBzFBUWETIpFxgaGx0SMyQlKS8PEIFCQlYv/aAAwDAQACEQMRAD8Af40RQ48QWmyar0XdR8eIwZEhq2JMxKR8Fkpe5ft5ZP7a5jbOIvwucD8p+Gq6/YGYMxmmcTbzgeuiSoj7nUKqdNe4ViVuAmrWvWtxPxirREupR+INw6epptKgRk5W+sJUCQFf3JI76jTZid0dGfaD6A/VTbtHhgq8VZDdozeW7jZou4aknQAW1KrivjcCzalWqEKhcjUKpqTNREodYpKFv2080lDbkFhUhSA8n1tKGFAKI9IIAJ2c9PI5oLuHtsNeF1JGw2P0FDiEsVEQDG1jA4sa9xLQ4SFt9C1znAhtxcBupspe2fcVGqNM27vanIMO96PTGqPVAqlMszFQZC0lh1holxsNocS4eYwDzWOQ4nNWR7GNy/d5j99Fytbi9NimIVza5wErnOexxbwcHA5bAkASAW5gG2uiuf3bv6TuU50Qbnz6ZMp1TXtPWrbcW8hIzJpNQ+nXgJ9JHF5Cx9l4PcHXF70ZC+Cnlt+Ej9Kt7qIhFPWQ3/G136hdNT7ZNPMbc2ExITwkIosFDgxjChHQD/3XoHCWltLEDya35BeWsYcHVcpHAud8yt41sFrkaItK3JtOJfm3l92RPSpcGs0abSngPcoeYW2cZ+cL1iV9OJYXxHg4EeoWZh1UYKhkzeLSD6G6Qh2L2Rpu4HTnRIsypVql3lCvq54zqYlLbmpeaS5GSfqAt1vglOF8SD35K9xqPdgtnxWUly7LbT4D3KTt5+2T8NxLIxgeHC+ptbUr43B0U7ovTYVKthW0V027KkoYZRW2pLKWQo4CvKKJLZx+SDqQHbuXBocyYen0K4GDfOL5XUx9zvqFu1q9BG50V2nrqVybW2DWOJZIjUSVJjMspOW0pda8pXHOTx4hIPtj5pbsIMge+U5TroPXjw9vqrg3ugyEMpwHdz9Bqrarg2DlW7sX4e1kyLqpl41aLVLthSahFiORmimbKic0NodWtfEFaPdRye/bUTb0MJjaylpWG4LyNeNja/BTBuk2ifPJW1rm2+zB04Xbe3qmb4zLcdhqOygNstpCEJ/JI7D/AINTM1oAsFADnEm5X3aqXxGiLxVGWIECZOLEiUGWlu+WygqW5xSVcUpHcqOMAD3J1RI7K0noq42ZnBvVJkdKlSjXRQrorqKUm1Y7151SWyyEgKQh5LSyheB6FBfmD5I+R8a47dtUltA4DQ5z8QCF2u+fD/8AbszG48NuvWxIK9/WDSb12msquX7t9GVcD+Y0hAgttRpbKlOoRyW0ngy+ElfIuANuBIVkuHjqWKSoe5ua17WUHVdM1rw29rrvnT/flXuK2KzJvB2BMj06MlTVQcQgOKS2jCzn3wSkqGe+FatVlSGHKr+H0xkGZdh6ervvS/N5OmY7gUF5uzqxWVSbVYdX5iktszT9SCxx5Npy02suK7L4nBw2ceasZrJa/aCCR+sWezB2YfMbd3c+3ZexcCwiHC9laiOM2nyB0h6F4u1t+oaRpyv3TK47gHU6hebFnX1EaIsEA9joioY61elLa7pxuZW6O0tLn281fFyPVOvUwP8AOCzUPLHN6M2e7Xnc1KWgEo5JBSE5I1i4ZhkNOX+FoHm9uQPb2q5tJjdRWNh8c3MYLQeZHK/W3Xj1XEaFtBG6jbss3aarkuUOrx5TUxAPuwiO64P/AGhvvroopxFC8hce6Dxp2NP70UHNj0V2o7bzKG407InLxFdZBwXljCVJz8DIOdMSaHOKowyQsTJfSD0q2ZtrRLO3Xq0u47q3NkUJEZmRU5XmMUKK6S4qLAYSAhlslRJPdZyr1YURrh6DZ2GGc1byXyWsCfwjo0DQD981KtftdUVNG2hY0RwixLRfzO/M5x1cfhw00CndroFzKNERoiNEVbnicQVSdmLMfYiPSpbVzMhPHGEpMd7kTn9BquMarAryMoJ6qKnRezIVvPt/Uo0XlMZo1UKGXfQpTwYICFe4Hc+/tgg6vykGFwPZYFK3/IaexUB+mulrXKp5qDDUec9U3DLQkji08XSVgfGAc6ya193GywaRlm3Kaks+F+G2nbNOJbKmKfHaJQrkklLaQcH5H31rXcV1MQs0BbFr4q0aIjREaIoP+ILUo9K6e5Ex6nTKmsVynIbbZSFEKU4UlRB/1CSrPt+uq2DjZYNeQGAnqFATpFueKrqH2aMeXHm0+RCqUF0trAcafXFJHNvOUj+iQD99XtDA4A66LBjuKhhtpquF7abbs2huVf8Aa8V+oVCDTaxJbaUSPOeJkOBo57DzCngoj5Orc8pLwOqqhiDWO7XTINMiNwKdAgtILbbLKGkpJzxCUgYz+2rZN9VuGNsLL3a+KpGiI0RGiKIPXfTqhUelrdAU2lmrSGGo0tTaVJCm225LaluAq7elIUo5+AdXoG3cAsHESRCSOSqo6FHqNT97rUr9VZpMBpP1an3vQhqKlEJ1ZUV9hhIScq7DAJ1nVNMWxEjjp81pMOqgZwHcNfkty6W6SzuH1D160otQNao7UiNcT84H/O2HFvBwH8lhTacffVFTBkOci3JXKGXxHeGDfW6vjHbWuXSo0RGiI0RGiL86sUuHXKTU6LUWg/T5kdyK+g4wttaSlQ7/AGUdfQbL44AixSmtzV63bOqO6m1qXosiu0OXV6W3AEry262ULW2tDSwc+oJUjie2BjPc67KN32bZD2/n6qN3058V8YPX39vaeQU5vCNrVybpXZdm4bVv/wAlQ4kJpipQ358VySWXAsRWlNRyoISrg453J4+WBklRCdRi08Z8jSCey6HA6CZjs72lo7gi/YX6K/XWjXUI0RGiI0RGiLCsYyTgDvoiRT6n92ek/pB8QPdfZHea5alekRNablCfFttyY7CcfxMy+645xcW15pCw2jAKScnBArn3j0lLKKN7XGQAaNF+Pv8A4XTYL/TnjeL4TJtBTyxMpWlwLpJMpBaLkWynXprc8gpg/wANFcNzObz9f0ejbaS6Ds/cVUauqBVnYqWPKdVU56Y8JCQSQn6V5D2FY4qKwAABmPNja+mlnqP7aRjwXFxym9rnS9tBzFugU0b+cHxSCmw12K0s8D2QxxAzMLA4sbZ2TN5nDg4uI+84jgBduzUgLzijREaIjREaIsKBIIGhRJryvBL68Lq6+bY393RuCPfNAbvSPWKxX3rgpktqq0/68PvR3WH4wkpYU0VshkBXFJCR21Dz8Ixh2ICWejie24u8SOBH/Vj25AL2pFtpsfFsm/D8PxysikyH7B1NEY3E6+Hmac1iQLPL9LA25Jv20bDsmwIL1MsazrVsumuOF1cek05mG0tXtyKGkpBOPk99SxTUcUIIiaGg9AB8l4+xPGqytcH1krpHDQFzi4j1JW2ayVrEaIjRF//Z",
@@ -706,12 +744,12 @@
"output_type": "stream",
"text": [
"The items match!\n",
- "The solid color of the first item tends to be versatile and can pair well with the second item, which appears to be a casual style of pants, creating a cohesive and suitable outfit for casual occasions.\n"
+ "The black shirt pairs well with the light blue jeans, creating a classic and balanced color combination that is casual and stylish.\n"
]
},
{
"data": {
- "image/jpeg": "",
+ "image/jpeg": "",
"text/plain": [
""
]
@@ -724,43 +762,7 @@
"output_type": "stream",
"text": [
"The items match!\n",
- "The solid color of the shirt provides a versatile base that can easily be matched with different styles and colors of footwear. The shoes appear to be dressy casual loafers, which can complement the shirt for a smart casual look.\n"
- ]
- },
- {
- "data": {
- "image/jpeg": "",
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "The items match!\n",
- "The dark-colored casual shirt can easily be paired with a variety of shoe styles, and the shoes presented have a casual yet slightly polished design that would complement the shirt for a casual to smart-casual look.\n"
- ]
- },
- {
- "data": {
- "image/jpeg": "",
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "The items match!\n",
- "A dark-colored button-up shirt typically pairs well with dark jeans for a casual yet put-together look.\n"
+ "Both the black shirt and the black watch have a sleek and coordinated look, making them suitable to be worn together as part of an outfit.\n"
]
}
],
@@ -795,7 +797,7 @@
"source": [
"### Conclusion\n",
"\n",
- "In this Jupyter Notebook, we explored the application of GPT-4 with Vision and other machine learning techniques to the domain of fashion. We demonstrated how to analyze images of clothing items, extract relevant features, and use this information to find matching items that complement an original outfit. Through the implementation of guardrails and self-correction mechanisms, we refined the model's suggestions to ensure they are accurate and contextually relevant.\n",
+ "In this Jupyter Notebook, we explored the application of GPT-4o and other machine learning techniques to the domain of fashion. We demonstrated how to analyze images of clothing items, extract relevant features, and use this information to find matching items that complement an original outfit. Through the implementation of guardrails and self-correction mechanisms, we refined the model's suggestions to ensure they are accurate and contextually relevant.\n",
"\n",
"This approach has several practical uses in the real world, including:\n",
"\n",
@@ -803,7 +805,7 @@
"2. **Virtual Wardrobe Applications**: Users can upload images of their own clothing items to create a virtual wardrobe and receive suggestions for new items that match their existing pieces.\n",
"3. **Fashion Design and Styling**: Fashion designers and stylists can use this tool to experiment with different combinations and styles, streamlining the creative process.\n",
"\n",
- "However, one of the considerations to keep in mind is **cost**. The use of LLMs and image analysis models can incur costs, especially if used extensively. It's important to consider the cost-effectiveness of implementing these technologies. `gpt-4-vision-preview` is priced at `$0.01` per 1000 tokens. This adds up to `$0.00255` for one 256px x 256px image.\n",
+ "However, one of the considerations to keep in mind is **cost**. The use of LLMs and image analysis models can incur costs, especially if used extensively. It's important to consider the cost-effectiveness of implementing these technologies.\n",
"\n",
"Overall, this notebook serves as a foundation for further exploration and development in the intersection of fashion and AI, opening doors to more personalized and intelligent fashion recommendation systems."
]
@@ -825,7 +827,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.12.1"
+ "version": "3.11.8"
}
},
"nbformat": 4,
diff --git a/examples/gpt4o/data/keynote_recap.mp4 b/examples/gpt4o/data/keynote_recap.mp4
new file mode 100644
index 0000000..818b90e
Binary files /dev/null and b/examples/gpt4o/data/keynote_recap.mp4 differ
diff --git a/examples/gpt4o/data/triangle.png b/examples/gpt4o/data/triangle.png
new file mode 100644
index 0000000..cb522cb
Binary files /dev/null and b/examples/gpt4o/data/triangle.png differ
diff --git a/examples/gpt4o/introduction_to_gpt4o.ipynb b/examples/gpt4o/introduction_to_gpt4o.ipynb
new file mode 100644
index 0000000..8007429
--- /dev/null
+++ b/examples/gpt4o/introduction_to_gpt4o.ipynb
@@ -0,0 +1,819 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Introduction to GPT-4o\n",
+ "GPT-4o (\"o\" for \"omni\") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.\n",
+ "\n",
+ "### Background\n",
+ "Before GPT-4o, users could interact with ChatGPT using Voice Mode, which operated with three separate models. GPT-4o will integrate these capabilities into a single model that's trained across text, vision, and audio. This unified approach ensures that all inputs—whether text, visual, or auditory—are processed cohesively by the same neural network.\n",
+ "\n",
+ "### Current API Capabilities\n",
+ "Currently, the API supports `{text, image}` inputs only, with `{text}` outputs, the same modalities as `gpt-4-turbo`. Additional modalities, including audio, will be introduced soon. This guide will help you get started with using GPT-4o for text, image, and video understanding.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Getting Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Install OpenAI SDK for Python\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install --upgrade openai --quiet"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Configure the OpenAI client and submit a test request\n",
+ "To setup the client for our use, we need to create an API key to use with our request. Skip these steps if you already have an API key for usage. \n",
+ "\n",
+ "You can get an API key by following these steps:\n",
+ "1. [Create a new project](https://help.openai.com/en/articles/9186755-managing-your-work-in-the-api-platform-with-projects)\n",
+ "2. [Generate an API key in your project](https://platform.openai.com/api-keys)\n",
+ "3. (RECOMMENDED, BUT NOT REQUIRED) [Setup your API key for all projects as an env var](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key)\n",
+ "\n",
+ "Once we have this setup, let's start with a simple {text} input to the model for our first request. We'll use both `system` and `user` messages for our first request, and we'll receive a response from the `assistant` role."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from openai import OpenAI \n",
+ "import os\n",
+ "\n",
+ "## Set the API key and model name\n",
+ "MODEL=\"gpt-4o\"\n",
+ "client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"\"))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Assistant: Sure! The sum of 2 + 2 is 4. If you have any more questions or need further assistance, feel free to ask!\n"
+ ]
+ }
+ ],
+ "source": [
+ "completion = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": \"You are a helpful assistant. Help me with my math homework!\"}, # <-- This is the system message that provides context to the model\n",
+ " {\"role\": \"user\", \"content\": \"Hello! Could you solve 2+2?\"} # <-- This is the user message for which the model will generate a response\n",
+ " ]\n",
+ ")\n",
+ "\n",
+ "print(\"Assistant: \" + completion.choices[0].message.content)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Image Processing\n",
+ "GPT-4o can directly process images and take intelligent actions based on the image. We can provide images in two formats:\n",
+ "1. Base64 Encoded\n",
+ "2. URL\n",
+ "\n",
+ "Let's first view the image we'll use, then try sending this image as both Base64 and as a URL link to the API"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from IPython.display import Image, display, Audio, Markdown\n",
+ "import base64\n",
+ "\n",
+ "IMAGE_PATH = \"data/triangle.png\"\n",
+ "\n",
+ "# Preview image for context\n",
+ "display(Image(IMAGE_PATH))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Base64 Image Processing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "To find the area of the triangle, we can use Heron's formula. Heron's formula states that the area of a triangle with sides of length \\(a\\), \\(b\\), and \\(c\\) is:\n",
+ "\n",
+ "\\[ \\text{Area} = \\sqrt{s(s-a)(s-b)(s-c)} \\]\n",
+ "\n",
+ "where \\(s\\) is the semi-perimeter of the triangle:\n",
+ "\n",
+ "\\[ s = \\frac{a + b + c}{2} \\]\n",
+ "\n",
+ "For the given triangle, the side lengths are \\(a = 5\\), \\(b = 6\\), and \\(c = 9\\).\n",
+ "\n",
+ "First, calculate the semi-perimeter \\(s\\):\n",
+ "\n",
+ "\\[ s = \\frac{5 + 6 + 9}{2} = \\frac{20}{2} = 10 \\]\n",
+ "\n",
+ "Now, apply Heron's formula:\n",
+ "\n",
+ "\\[ \\text{Area} = \\sqrt{10(10-5)(10-6)(10-9)} \\]\n",
+ "\\[ \\text{Area} = \\sqrt{10 \\cdot 5 \\cdot 4 \\cdot 1} \\]\n",
+ "\\[ \\text{Area} = \\sqrt{200} \\]\n",
+ "\\[ \\text{Area} = 10\\sqrt{2} \\]\n",
+ "\n",
+ "So, the area of the triangle is \\(10\\sqrt{2}\\) square units."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Open the image file and encode it as a base64 string\n",
+ "def encode_image(image_path):\n",
+ " with open(image_path, \"rb\") as image_file:\n",
+ " return base64.b64encode(image_file.read()).decode(\"utf-8\")\n",
+ "\n",
+ "base64_image = encode_image(IMAGE_PATH)\n",
+ "\n",
+ "response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": \"You are a helpful assistant that responds in Markdown. Help me with my math homework!\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"What's the area of the triangle?\"},\n",
+ " {\"type\": \"image_url\", \"image_url\": {\n",
+ " \"url\": f\"data:image/png;base64,{base64_image}\"}\n",
+ " }\n",
+ " ]}\n",
+ " ],\n",
+ " temperature=0.0,\n",
+ ")\n",
+ "\n",
+ "display(Markdown(response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### URL Image Processing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "To find the area of the triangle, we can use Heron's formula. First, we need to find the semi-perimeter of the triangle.\n",
+ "\n",
+ "The sides of the triangle are 6, 5, and 9.\n",
+ "\n",
+ "1. Calculate the semi-perimeter \\( s \\):\n",
+ "\\[ s = \\frac{a + b + c}{2} = \\frac{6 + 5 + 9}{2} = 10 \\]\n",
+ "\n",
+ "2. Use Heron's formula to find the area \\( A \\):\n",
+ "\\[ A = \\sqrt{s(s-a)(s-b)(s-c)} \\]\n",
+ "\n",
+ "Substitute the values:\n",
+ "\\[ A = \\sqrt{10(10-6)(10-5)(10-9)} \\]\n",
+ "\\[ A = \\sqrt{10 \\cdot 4 \\cdot 5 \\cdot 1} \\]\n",
+ "\\[ A = \\sqrt{200} \\]\n",
+ "\\[ A = 10\\sqrt{2} \\]\n",
+ "\n",
+ "So, the area of the triangle is \\( 10\\sqrt{2} \\) square units."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": \"You are a helpful assistant that responds in Markdown. Help me with my math homework!\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"What's the area of the triangle?\"},\n",
+ " {\"type\": \"image_url\", \"image_url\": {\n",
+ " \"url\": \"https://upload.wikimedia.org/wikipedia/commons/e/e2/The_Algebra_of_Mohammed_Ben_Musa_-_page_82b.png\"}\n",
+ " }\n",
+ " ]}\n",
+ " ],\n",
+ " temperature=0.0,\n",
+ ")\n",
+ "\n",
+ "display(Markdown(response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Video Processing\n",
+ "While it's not possible to directly send a video to the API, GPT-4o can understand videos if you sample frames and then provide them as images. It performs better at this task than GPT-4 Turbo.\n",
+ "\n",
+ "Since GPT-4o in the API does not yet support audio-in (as of May 2024), we'll use a combination of GPT-4o and Whisper to process both the audio and visual for a provided video, and showcase two usecases:\n",
+ "1. Summarization\n",
+ "2. Question and Answering\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Setup for Video Processing\n",
+ "We'll use two python packages for video processing - opencv-python and moviepy. \n",
+ "\n",
+ "These require [ffmpeg](https://ffmpeg.org/about.html), so make sure to install this beforehand. Depending on your OS, you may need to run `brew install ffmpeg` or `sudo apt install ffmpeg`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install opencv-python --quiet\n",
+ "%pip install moviepy --quiet"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Process the video into two components: frames and audio"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import cv2\n",
+ "from moviepy.editor import VideoFileClip\n",
+ "import time\n",
+ "import base64\n",
+ "\n",
+ "# We'll be using the OpenAI DevDay Keynote Recap video. You can review the video here: https://www.youtube.com/watch?v=h02ti0Bl6zk\n",
+ "VIDEO_PATH = \"data/keynote_recap.mp4\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MoviePy - Writing audio in data/keynote_recap.mp3\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ " \r"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MoviePy - Done.\n",
+ "Extracted 218 frames\n",
+ "Extracted audio to data/keynote_recap.mp3\n"
+ ]
+ }
+ ],
+ "source": [
+ "def process_video(video_path, seconds_per_frame=2):\n",
+ " base64Frames = []\n",
+ " base_video_path, _ = os.path.splitext(video_path)\n",
+ "\n",
+ " video = cv2.VideoCapture(video_path)\n",
+ " total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))\n",
+ " fps = video.get(cv2.CAP_PROP_FPS)\n",
+ " frames_to_skip = int(fps * seconds_per_frame)\n",
+ " curr_frame=0\n",
+ "\n",
+ " # Loop through the video and extract frames at specified sampling rate\n",
+ " while curr_frame < total_frames - 1:\n",
+ " video.set(cv2.CAP_PROP_POS_FRAMES, curr_frame)\n",
+ " success, frame = video.read()\n",
+ " if not success:\n",
+ " break\n",
+ " _, buffer = cv2.imencode(\".jpg\", frame)\n",
+ " base64Frames.append(base64.b64encode(buffer).decode(\"utf-8\"))\n",
+ " curr_frame += frames_to_skip\n",
+ " video.release()\n",
+ "\n",
+ " # Extract audio from video\n",
+ " audio_path = f\"{base_video_path}.mp3\"\n",
+ " clip = VideoFileClip(video_path)\n",
+ " clip.audio.write_audiofile(audio_path, bitrate=\"32k\")\n",
+ " clip.audio.close()\n",
+ " clip.close()\n",
+ "\n",
+ " print(f\"Extracted {len(base64Frames)} frames\")\n",
+ " print(f\"Extracted audio to {audio_path}\")\n",
+ " return base64Frames, audio_path\n",
+ "\n",
+ "# Extract 1 frame per second. You can adjust the `seconds_per_frame` parameter to change the sampling rate\n",
+ "base64Frames, audio_path = process_video(VIDEO_PATH, seconds_per_frame=1)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/jpeg": "
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "image/jpeg": {
+ "width": 600
+ }
+ },
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## Display the frames and audio for context\n",
+ "display_handle = display(None, display_id=True)\n",
+ "for img in base64Frames:\n",
+ " display_handle.update(Image(data=base64.b64decode(img.encode(\"utf-8\")), width=600))\n",
+ " time.sleep(0.025)\n",
+ "\n",
+ "Audio(audio_path)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Example 1: Summarization\n",
+ "Now that we have both the video frames and the audio, let's run a few different tests to generate a video summary to compare the results of using the models with different modalities. We should expect to see that the summary generated with context from both visual and audio inputs will be the most accurate, as the model is able to use the entire context from the video.\n",
+ "\n",
+ "1. Visual Summary\n",
+ "2. Audio Summary\n",
+ "3. Visual + Audio Summary\n",
+ "\n",
+ "#### Visual Summary\n",
+ "The visual summary is generated by sending the model only the frames from the video. With just the frames, the model is likely to capture the visual aspects, but will miss any details discussed by the speaker."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "## Video Summary\n",
+ "\n",
+ "The video appears to be a presentation from OpenAI's DevDay event. Here is a summary based on the provided frames:\n",
+ "\n",
+ "1. **Introduction**:\n",
+ " - The video starts with the title \"OpenAI DevDay\" and a \"Keynote Recap\" slide.\n",
+ " - The event venue is shown, with attendees gathering and the stage being set up.\n",
+ "\n",
+ "2. **Keynote Presentation**:\n",
+ " - A speaker, likely a representative from OpenAI, takes the stage to deliver the keynote address.\n",
+ " - The presentation covers several key topics and announcements:\n",
+ " - **GPT-4 Turbo**: Introduction of GPT-4 Turbo, highlighting its capabilities and improvements.\n",
+ " - **JSON Mode**: A feature that allows structured data output in JSON format.\n",
+ " - **Function Calling**: Demonstration of how the model can call functions based on user instructions.\n",
+ " - **Enhanced Features**: Discussion on improvements such as increased context length, better control, and enhanced knowledge.\n",
+ " - **DALL-E 3**: Introduction of DALL-E 3, a new version of the image generation model.\n",
+ " - **Custom Models**: Announcement of the ability to create custom models tailored to specific needs.\n",
+ " - **Token Efficiency**: Explanation of the new token efficiency, with 3x less input tokens and 2x less output tokens.\n",
+ " - **API Enhancements**: Overview of new API features, including threading, retrieval, code interpreter, and function calling.\n",
+ "\n",
+ "3. **Closing Remarks**:\n",
+ " - The speaker emphasizes the importance of building with natural language and the potential of the new tools and features.\n",
+ " - The presentation concludes with a thank you to the audience and a final display of the OpenAI DevDay logo.\n",
+ "\n",
+ "4. **Audience Engagement**:\n",
+ " - The video shows the audience's reactions and engagement during the presentation, with applause and focused attention.\n",
+ "\n",
+ "Overall, the video captures the highlights of OpenAI's DevDay event, showcasing new advancements and features in their AI models and tools."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": \"You are generating a video summary. Please provide a summary of the video. Respond in Markdown.\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " \"These are the frames from the video.\",\n",
+ " *map(lambda x: {\"type\": \"image_url\", \n",
+ " \"image_url\": {\"url\": f'data:image/jpg;base64,{x}', \"detail\": \"low\"}}, base64Frames)\n",
+ " ],\n",
+ " }\n",
+ " ],\n",
+ " temperature=0,\n",
+ ")\n",
+ "display(Markdown(response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The results are as expected - the model is able to capture the high level aspects of the video visuals, but misses the details provided in the speech.\n",
+ "\n",
+ "#### Audio Summary\n",
+ "The audio summary is generated by sending the model the audio transcript. With just the audio, the model is likely to bias towards the audio content, and will miss the context provided by the presentations and visuals.\n",
+ "\n",
+ "`{audio}` input for GPT-4o isn't currently available but will be coming soon! For now, we use our existing `whisper-1` model to process the audio"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "### Summary\n",
+ "\n",
+ "Welcome to OpenAI's first-ever Dev Day. Key announcements include:\n",
+ "\n",
+ "- **GPT-4 Turbo**: A new model supporting up to 128,000 tokens of context, featuring JSON mode for valid JSON responses, improved instruction following, and better knowledge retrieval from external documents or databases. It is also significantly cheaper than GPT-4.\n",
+ "- **New Features**: \n",
+ " - **Dolly 3**, **GPT-4 Turbo with Vision**, and a new **Text-to-Speech model** are now available in the API.\n",
+ " - **Custom Models**: A program where OpenAI researchers help companies create custom models tailored to their specific use cases.\n",
+ " - **Increased Rate Limits**: Doubling tokens per minute for established GPT-4 customers and allowing requests for further rate limit changes.\n",
+ "- **GPTs**: Tailored versions of ChatGPT for specific purposes, programmable through conversation, with options for private or public sharing, and a forthcoming GPT Store.\n",
+ "- **Assistance API**: Includes persistent threads, built-in retrieval, a code interpreter, and improved function calling.\n",
+ "\n",
+ "OpenAI is excited about the future of AI integration and looks forward to seeing what users will create with these new tools. The event concludes with an invitation to return next year for more advancements."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Transcribe the audio\n",
+ "transcription = client.audio.transcriptions.create(\n",
+ " model=\"whisper-1\",\n",
+ " file=open(audio_path, \"rb\"),\n",
+ ")\n",
+ "## OPTIONAL: Uncomment the line below to print the transcription\n",
+ "#print(\"Transcript: \", transcription.text + \"\\n\\n\")\n",
+ "\n",
+ "response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\":\"\"\"You are generating a transcript summary. Create a summary of the provided transcription. Respond in Markdown.\"\"\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " {\"type\": \"text\", \"text\": f\"The audio transcription is: {transcription.text}\"}\n",
+ " ],\n",
+ " }\n",
+ " ],\n",
+ " temperature=0,\n",
+ ")\n",
+ "display(Markdown(response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The audio summary is biased towards the content discussed during the speech, but comes out with much less structure than the video summary.\n",
+ "\n",
+ "#### Audio + Visual Summary\n",
+ "The Audio + Visual summary is generated by sending the model both the visual and the audio from the video at once. When sending both of these, the model is expected to better summarize since it can perceive the entire video at once."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "## Video Summary\n",
+ "\n",
+ "### Event Introduction\n",
+ "- **Title:** OpenAI Dev Day\n",
+ "- **Keynote Recap:** The event begins with a keynote recap, setting the stage for the announcements.\n",
+ "\n",
+ "### Venue and Audience\n",
+ "- **Location:** The event is held at a venue with a sign reading \"OpenAI DevDay.\"\n",
+ "- **Audience:** The venue is filled with attendees, eagerly awaiting the presentations.\n",
+ "\n",
+ "### Key Announcements\n",
+ "1. **GPT-4 Turbo:**\n",
+ " - **Launch:** Introduction of GPT-4 Turbo.\n",
+ " - **Features:** Supports up to 128,000 tokens of context.\n",
+ " - **JSON Mode:** Ensures responses in valid JSON format.\n",
+ " - **Function Calling:** Improved ability to call multiple functions and follow instructions.\n",
+ " - **Knowledge Update:** Knowledge up to April 2023, with ongoing improvements.\n",
+ " - **API Integration:** Available in the API along with DALL-E 3 and a new Text-to-Speech model.\n",
+ " - **Custom Models:** New program for creating custom models tailored to specific use cases.\n",
+ " - **Rate Limits:** Doubling tokens per minute for established GPT-4 customers, with options to request further changes.\n",
+ " - **Pricing:** GPT-4 Turbo is significantly cheaper than GPT-4 (3x less for prompt tokens, 2x less for completion tokens).\n",
+ "\n",
+ "2. **GPTs:**\n",
+ " - **Introduction:** Tailored versions of ChatGPT for specific purposes.\n",
+ " - **Features:** Combine instructions, expanded knowledge, and actions for better performance and control.\n",
+ " - **Ease of Use:** Can be programmed through conversation, no coding required.\n",
+ " - **Customization:** Options to create private GPTs, share publicly, or make them exclusive to a company.\n",
+ " - **GPT Store:** Launching later this month for sharing and discovering GPTs.\n",
+ "\n",
+ "3. **Assistance API:**\n",
+ " - **Features:** Includes persistent threads, built-in retrieval, code interpreter, and improved function calling.\n",
+ " - **Integration:** Designed to integrate intelligence into various applications, providing \"superpowers on demand.\"\n",
+ "\n",
+ "### Closing Remarks\n",
+ "- **Future Outlook:** The technology launched today is just the beginning, with more advancements in the pipeline.\n",
+ "- **Gratitude:** Thanks to the attendees and a promise of more exciting developments in the future.\n",
+ "\n",
+ "### Conclusion\n",
+ "- **Event End:** The event concludes with applause and a final thank you to the audience."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## Generate a summary with visual and audio\n",
+ "response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\":\"\"\"You are generating a video summary. Create a summary of the provided video and its transcript. Respond in Markdown\"\"\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " \"These are the frames from the video.\",\n",
+ " *map(lambda x: {\"type\": \"image_url\", \n",
+ " \"image_url\": {\"url\": f'data:image/jpg;base64,{x}', \"detail\": \"low\"}}, base64Frames),\n",
+ " {\"type\": \"text\", \"text\": f\"The audio transcription is: {transcription.text}\"}\n",
+ " ],\n",
+ " }\n",
+ "],\n",
+ " temperature=0,\n",
+ ")\n",
+ "display(Markdown(response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "After combining both the video and audio, we're able to get a much more detailed and comprehensive summary for the event which uses information from both the visual and audio elements from the video.\n",
+ "\n",
+ "### Example 2: Question and Answering\n",
+ "For the Q&A, we'll use the same concept as before to ask questions of our processed video while running the same 3 tests to demonstrate the benefit of combining input modalities:\n",
+ "1. Visual Q&A\n",
+ "2. Audio Q&A\n",
+ "3. Visual + Audio Q&A "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "QUESTION = \"Question: Why did Sam Altman have an example about raising windows and turning the radio on?\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "Visual QA:Sam Altman used the example about raising windows and turning the radio on to demonstrate the function calling capabilities of the new model. The example illustrated how the model can interpret and execute specific commands by calling appropriate functions, showcasing its ability to handle complex tasks and integrate with external systems or APIs. This feature enhances the model's utility in practical applications by allowing it to perform actions based on user instructions."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "qa_visual_response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": \"Use the video to answer the provided question. Respond in Markdown.\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " \"These are the frames from the video.\",\n",
+ " *map(lambda x: {\"type\": \"image_url\", \"image_url\": {\"url\": f'data:image/jpg;base64,{x}', \"detail\": \"low\"}}, base64Frames),\n",
+ " QUESTION\n",
+ " ],\n",
+ " }\n",
+ " ],\n",
+ " temperature=0,\n",
+ ")\n",
+ "display(Markdown(\"Visual QA:\" + qa_visual_response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "Audio QA:\n",
+ "The provided transcription does not include any mention of Sam Altman or an example about raising windows and turning the radio on. Therefore, I cannot provide an answer based on the given transcription."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "qa_audio_response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\":\"\"\"Use the transcription to answer the provided question. Respond in Markdown.\"\"\"},\n",
+ " {\"role\": \"user\", \"content\": f\"The audio transcription is: {transcription.text}. \\n\\n {QUESTION}\"},\n",
+ " ],\n",
+ " temperature=0,\n",
+ ")\n",
+ "display(Markdown(\"Audio QA:\\n\" + qa_audio_response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "Both QA:\n",
+ "Sam Altman used the example of raising windows and turning the radio on to demonstrate the improved function calling capabilities of GPT-4 Turbo. The example illustrated how the model can now handle multiple function calls more effectively and follow instructions better. In the demonstration, the model was able to interpret the command to raise the windows and turn the radio on, showing how it can execute multiple actions in response to a single prompt. This highlights the enhanced ability of GPT-4 Turbo to manage complex tasks and provide more accurate and useful responses."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "qa_both_response = client.chat.completions.create(\n",
+ " model=MODEL,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\":\"\"\"Use the video and transcription to answer the provided question.\"\"\"},\n",
+ " {\"role\": \"user\", \"content\": [\n",
+ " \"These are the frames from the video.\",\n",
+ " *map(lambda x: {\"type\": \"image_url\", \n",
+ " \"image_url\": {\"url\": f'data:image/jpg;base64,{x}', \"detail\": \"low\"}}, base64Frames),\n",
+ " {\"type\": \"text\", \"text\": f\"The audio transcription is: {transcription.text}\"},\n",
+ " QUESTION\n",
+ " ],\n",
+ " }\n",
+ " ],\n",
+ " temperature=0,\n",
+ ")\n",
+ "display(Markdown(\"Both QA:\\n\" + qa_both_response.choices[0].message.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Comparing the three answers, the most accurate answer is generated by using both the audio and visual from the video. Sam Altman did not discuss the raising windows or radio on during the Keynote, but referenced an improved capability for the model to execute multiple functions in a single request while the examples were shown behind him.\n",
+ "\n",
+ "## Conclusion\n",
+ "Integrating many input modalities such as audio, visual, and textual, significantly enhances the performance of the model on a diverse range of tasks. This multimodal approach allows for more comprehensive understanding and interaction, mirroring more closely how humans perceive and process information. \n",
+ "\n",
+ "Currently, GPT-4o in the API supports text and image inputs, with audio capabilities coming soon."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/registry.yaml b/registry.yaml
index 017b397..0c76471 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -1205,7 +1205,7 @@
- guardrails
- title: How to combine GPT4 with Vision with RAG to create a clothing matchmaker app
- path: examples/How_to_combine_GPT4v_with_RAG_Outfit_Assistant.ipynb
+ path: examples/How_to_combine_GPT4o_with_RAG_Outfit_Assistant.ipynb
date: 2024-02-16
authors:
- teomusatoiu
@@ -1294,4 +1294,14 @@
- colin-openai
tags:
- completions
- - functions
\ No newline at end of file
+ - functions
+
+- title: Introduction to gpt-4o
+ path: examples/gpt4o/introduction_to_gpt4o.ipynb
+ date: 2024-05-13
+ authors:
+ - justonf
+ tags:
+ - completions
+ - vision
+ - whisper
\ No newline at end of file