{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "cb1537e6", "metadata": {}, "source": [ "# Using Weaviate with Generative OpenAI module for Generative Search\n", "\n", "This notebook is prepared for a scenario where:\n", "* Your data is already in Weaviate\n", "* You want to use Weaviate with the Generative OpenAI module ([generative-openai](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai)).\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f1a618c5", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "This cookbook only coveres Generative Search examples, however, it doesn't cover the configuration and data imports.\n", "\n", "In order to make the most of this cookbook, please complete the [Getting Started cookbook](./getting-started-with-weaviate-and-openai.ipynb) first, where you will learn the essentials of working with Weaviate and import the demo data.\n", "\n", "Checklist:\n", "* completed [Getting Started cookbook](./getting-started-with-weaviate-and-openai.ipynb),\n", "* crated a `Weaviate` instance,\n", "* imported data into your `Weaviate` instance,\n", "* you have an [OpenAI API key](https://beta.openai.com/account/api-keys)" ] }, { "cell_type": "markdown", "id": "36fe86f4", "metadata": {}, "source": [ "===========================================================\n", "## Prepare your OpenAI API key\n", "\n", "The `OpenAI API key` is used for vectorization of your data at import, and for running queries.\n", "\n", "If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n", "\n", "Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`." ] }, { "cell_type": "code", "execution_count": null, "id": "43395339", "metadata": {}, "outputs": [], "source": [ "# Export OpenAI API Key\n", "!export OPENAI_API_KEY=\"your key\"" ] }, { "cell_type": "code", "execution_count": null, "id": "88be138c", "metadata": {}, "outputs": [], "source": [ "# Test that your OpenAI API key is correctly set as an environment variable\n", "# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n", "import os\n", "\n", "# Note. alternatively you can set a temporary env variable like this:\n", "# os.environ[\"OPENAI_API_KEY\"] = 'your-key-goes-here'\n", "\n", "if os.getenv(\"OPENAI_API_KEY\") is not None:\n", " print (\"OPENAI_API_KEY is ready\")\n", "else:\n", " print (\"OPENAI_API_KEY environment variable not found\")" ] }, { "cell_type": "markdown", "id": "91df4d5b", "metadata": {}, "source": [ "## Connect to your Weaviate instance\n", "\n", "In this section, we will:\n", "\n", "1. test env variable `OPENAI_API_KEY` – **make sure** you completed the step in [#Prepare-your-OpenAI-API-key](#Prepare-your-OpenAI-API-key)\n", "2. connect to your Weaviate with your `OpenAI API Key`\n", "3. and test the client connection\n", "\n", "### The client \n", "\n", "After this step, the `client` object will be used to perform all Weaviate-related operations." ] }, { "cell_type": "code", "execution_count": null, "id": "cc662c1b", "metadata": {}, "outputs": [], "source": [ "import weaviate\n", "from datasets import load_dataset\n", "import os\n", "\n", "# Connect to your Weaviate instance\n", "client = weaviate.Client(\n", " url=\"https://your-wcs-instance-name.weaviate.network/\",\n", " # url=\"http://localhost:8080/\",\n", " auth_client_secret=weaviate.auth.AuthApiKey(api_key=\"\"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)\n", " additional_headers={\n", " \"X-OpenAI-Api-Key\": os.getenv(\"OPENAI_API_KEY\")\n", " }\n", ")\n", "\n", "# Check if your instance is live and ready\n", "# This should return `True`\n", "client.is_ready()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ceb14da9", "metadata": {}, "source": [ "## Generative Search\n", "Weaviate offers a [Generative Search OpenAI](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai) module, which generates responses based on the data stored in your Weaviate instance.\n", "\n", "The way you construct a generative search query is very similar to a standard semantic search query in Weaviate. \n", "\n", "For example:\n", "* search in \"Articles\", \n", "* return \"title\", \"content\", \"url\"\n", "* look for objects related to \"football clubs\"\n", "* limit results to 5 objects\n", "\n", "```\n", " result = (\n", " client.query\n", " .get(\"Articles\", [\"title\", \"content\", \"url\"])\n", " .with_near_text(\"concepts\": \"football clubs\")\n", " .with_limit(5)\n", " # generative query will go here\n", " .do()\n", " )\n", "```\n", "\n", "Now, you can add `with_generate()` function to apply generative transformation. `with_generate` takes either:\n", "- `single_prompt` - to generate a response for each returned object,\n", "- `grouped_task` – to generate a single response from all returned objects.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "51559251", "metadata": {}, "outputs": [], "source": [ "def generative_search_per_item(query, collection_name):\n", " prompt = \"Summarize in a short tweet the following content: {content}\"\n", "\n", " result = (\n", " client.query\n", " .get(collection_name, [\"title\", \"content\", \"url\"])\n", " .with_near_text({ \"concepts\": [query], \"distance\": 0.7 })\n", " .with_limit(5)\n", " .with_generate(single_prompt=prompt)\n", " .do()\n", " )\n", " \n", " # Check for errors\n", " if (\"errors\" in result):\n", " print (\"\\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.\")\n", " raise Exception(result[\"errors\"][0]['message'])\n", " \n", " return result[\"data\"][\"Get\"][collection_name]" ] }, { "cell_type": "code", "execution_count": null, "id": "a4604726", "metadata": {}, "outputs": [], "source": [ "query_result = generative_search_per_item(\"football clubs\", \"Article\")\n", "\n", "for i, article in enumerate(query_result):\n", " print(f\"{i+1}. { article['title']}\")\n", " print(article['_additional']['generate']['singleResult']) # print generated response\n", " print(\"-----------------------\")" ] }, { "cell_type": "code", "execution_count": 79, "id": "a45ea160", "metadata": {}, "outputs": [], "source": [ "def generative_search_group(query, collection_name):\n", " generateTask = \"Explain what these have in common\"\n", "\n", " result = (\n", " client.query\n", " .get(collection_name, [\"title\", \"content\", \"url\"])\n", " .with_near_text({ \"concepts\": [query], \"distance\": 0.7 })\n", " .with_generate(grouped_task=generateTask)\n", " .with_limit(5)\n", " .do()\n", " )\n", " \n", " # Check for errors\n", " if (\"errors\" in result):\n", " print (\"\\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.\")\n", " raise Exception(result[\"errors\"][0]['message'])\n", " \n", " return result[\"data\"][\"Get\"][collection_name]" ] }, { "cell_type": "code", "execution_count": null, "id": "11e0dad2", "metadata": {}, "outputs": [], "source": [ "query_result = generative_search_group(\"football clubs\", \"Article\")\n", "\n", "print (query_result[0]['_additional']['generate']['groupedResult'])" ] }, { "cell_type": "markdown", "id": "2007be48", "metadata": {}, "source": [ "Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 }