mirror of
https://github.com/james-m-jordan/openai-cookbook.git
synced 2025-05-09 19:32:38 +00:00

* add Weaviate authentication to client configuration * add a cookbook for generative search * fix name
276 lines
9.2 KiB
Plaintext
276 lines
9.2 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "cb1537e6",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Using Weaviate with Generative OpenAI module for Generative Search\n",
|
||
"\n",
|
||
"This notebook is prepared for a scenario where:\n",
|
||
"* Your data is already in Weaviate\n",
|
||
"* You want to use Weaviate with the Generative OpenAI module ([generative-openai](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai)).\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "f1a618c5",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Prerequisites\n",
|
||
"\n",
|
||
"This cookbook only coveres Generative Search examples, however, it doesn't cover the configuration and data imports.\n",
|
||
"\n",
|
||
"In order to make the most of this cookbook, please complete the [Getting Started cookbook](./getting-started-with-weaviate-and-openai.ipynb) firts, where you will learn the essentials of working with Weaviate and import the demo data.\n",
|
||
"\n",
|
||
"Checklist:\n",
|
||
"* completed [Getting Started cookbook](./getting-started-with-weaviate-and-openai.ipynb),\n",
|
||
"* crated a `Weaviate` instance,\n",
|
||
"* imported data into your `Weaviate` instance,\n",
|
||
"* you have an [OpenAI API key](https://beta.openai.com/account/api-keys)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "36fe86f4",
|
||
"metadata": {},
|
||
"source": [
|
||
"===========================================================\n",
|
||
"## Prepare your OpenAI API key\n",
|
||
"\n",
|
||
"The `OpenAI API key` is used for vectorization of your data at import, and for running queries.\n",
|
||
"\n",
|
||
"If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
|
||
"\n",
|
||
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "43395339",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Export OpenAI API Key\n",
|
||
"!export OPENAI_API_KEY=\"your key\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "88be138c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Test that your OpenAI API key is correctly set as an environment variable\n",
|
||
"# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n",
|
||
"import os\n",
|
||
"\n",
|
||
"# Note. alternatively you can set a temporary env variable like this:\n",
|
||
"# os.environ[\"OPENAI_API_KEY\"] = 'your-key-goes-here'\n",
|
||
"\n",
|
||
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
|
||
" print (\"OPENAI_API_KEY is ready\")\n",
|
||
"else:\n",
|
||
" print (\"OPENAI_API_KEY environment variable not found\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "91df4d5b",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Connect to your Weaviate instance\n",
|
||
"\n",
|
||
"In this section, we will:\n",
|
||
"\n",
|
||
"1. test env variable `OPENAI_API_KEY` – **make sure** you completed the step in [#Prepare-your-OpenAI-API-key](#Prepare-your-OpenAI-API-key)\n",
|
||
"2. connect to your Weaviate with your `OpenAI API Key`\n",
|
||
"3. and test the client connection\n",
|
||
"\n",
|
||
"### The client \n",
|
||
"\n",
|
||
"After this step, the `client` object will be used to perform all Weaviate-related operations."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "cc662c1b",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import weaviate\n",
|
||
"from datasets import load_dataset\n",
|
||
"import os\n",
|
||
"\n",
|
||
"# Connect to your Weaviate instance\n",
|
||
"client = weaviate.Client(\n",
|
||
" url=\"https://your-wcs-instance-name.weaviate.network/\",\n",
|
||
" # url=\"http://localhost:8080/\",\n",
|
||
" auth_client_secret=weaviate.auth.AuthApiKey(api_key=\"<YOUR-WEAVIATE-API-KEY>\"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)\n",
|
||
" additional_headers={\n",
|
||
" \"X-OpenAI-Api-Key\": os.getenv(\"OPENAI_API_KEY\")\n",
|
||
" }\n",
|
||
")\n",
|
||
"\n",
|
||
"# Check if your instance is live and ready\n",
|
||
"# This should return `True`\n",
|
||
"client.is_ready()"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "ceb14da9",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Generative Search\n",
|
||
"Weaviate offers a [Generative Search OpenAI](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai) module, which generates responses based on the data stored in your Weaviate instance.\n",
|
||
"\n",
|
||
"The way you construct a generative search query is very similar to a standard semantic search query in Weaviate. \n",
|
||
"\n",
|
||
"For example:\n",
|
||
"* search in \"Articles\", \n",
|
||
"* return \"title\", \"content\", \"url\"\n",
|
||
"* look for objects related to \"football clubs\"\n",
|
||
"* limit results to 5 objects\n",
|
||
"\n",
|
||
"```\n",
|
||
" result = (\n",
|
||
" client.query\n",
|
||
" .get(\"Articles\", [\"title\", \"content\", \"url\"])\n",
|
||
" .with_near_text(\"concepts\": \"football clubs\")\n",
|
||
" .with_limit(5)\n",
|
||
" # generative query will go here\n",
|
||
" .do()\n",
|
||
" )\n",
|
||
"```\n",
|
||
"\n",
|
||
"Now, you can add `with_generate()` function to apply generative transformation. `with_generate` takes either:\n",
|
||
"- `single_prompt` - to generate a response for each returned object,\n",
|
||
"- `grouped_task` – to generate a single response from all returned objects.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "51559251",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def generative_search_per_item(query, collection_name):\n",
|
||
" prompt = \"Summarize in a short tweet the following content: {content}\"\n",
|
||
"\n",
|
||
" result = (\n",
|
||
" client.query\n",
|
||
" .get(collection_name, [\"title\", \"content\", \"url\"])\n",
|
||
" .with_near_text({ \"concepts\": [query], \"distance\": 0.7 })\n",
|
||
" .with_limit(5)\n",
|
||
" .with_generate(single_prompt=prompt)\n",
|
||
" .do()\n",
|
||
" )\n",
|
||
" \n",
|
||
" # Check for errors\n",
|
||
" if (\"errors\" in result):\n",
|
||
" print (\"\\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.\")\n",
|
||
" raise Exception(result[\"errors\"][0]['message'])\n",
|
||
" \n",
|
||
" return result[\"data\"][\"Get\"][collection_name]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a4604726",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query_result = generative_search_per_item(\"football clubs\", \"Article\")\n",
|
||
"\n",
|
||
"for i, article in enumerate(query_result):\n",
|
||
" print(f\"{i+1}. { article['title']}\")\n",
|
||
" print(article['_additional']['generate']['singleResult']) # print generated response\n",
|
||
" print(\"-----------------------\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 79,
|
||
"id": "a45ea160",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def generative_search_group(query, collection_name):\n",
|
||
" generateTask = \"Explain what these have in common\"\n",
|
||
"\n",
|
||
" result = (\n",
|
||
" client.query\n",
|
||
" .get(collection_name, [\"title\", \"content\", \"url\"])\n",
|
||
" .with_near_text({ \"concepts\": [query], \"distance\": 0.7 })\n",
|
||
" .with_generate(grouped_task=generateTask)\n",
|
||
" .with_limit(5)\n",
|
||
" .do()\n",
|
||
" )\n",
|
||
" \n",
|
||
" # Check for errors\n",
|
||
" if (\"errors\" in result):\n",
|
||
" print (\"\\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.\")\n",
|
||
" raise Exception(result[\"errors\"][0]['message'])\n",
|
||
" \n",
|
||
" return result[\"data\"][\"Get\"][collection_name]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "11e0dad2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query_result = generative_search_group(\"football clubs\", \"Article\")\n",
|
||
"\n",
|
||
"print (query_result[0]['_additional']['generate']['groupedResult'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2007be48",
|
||
"metadata": {},
|
||
"source": [
|
||
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.9.12"
|
||
},
|
||
"vscode": {
|
||
"interpreter": {
|
||
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
|
||
}
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|