"# Question Answering in Weaviate with OpenAI Q&A module\n",
"\n",
"This notebook is prepared for a scenario where:\n",
"* Your data is not vectorized\n",
"* You want to run Q&A ([learn more](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/qna-openai)) on your data based on the [OpenAI completions](https://beta.openai.com/docs/api-reference/completions) endpoint.\n",
"* You want to use Weaviate with the OpenAI module ([text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai)), to generate vector embeddings for you.\n",
"\n",
"This notebook takes you through a simple flow to set up a Weaviate instance, connect to it (with OpenAI API key), configure data schema, import data (which will automatically generate vector embeddings for your data), and run question answering.\n",
"\n",
"## What is Weaviate\n",
"\n",
"Weaviate is an open-source vector search engine that stores data objects together with their vectors. This allows for combining vector search with structured filtering.\n",
"\n",
"Weaviate uses KNN algorithms to create an vector-optimized index, which allows your queries to run extremely fast. Learn more [here](https://weaviate.io/blog/why-is-vector-search-so-fast).\n",
"Additionally, Weaviate has a [REST layer](https://weaviate.io/developers/weaviate/api/rest/objects). Basically you can call Weaviate from any language that supports REST requests."
"All Weaviate instances come equipped with the [text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai) and the [qna-openai](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/qna-openai) modules.\n",
"The first module is responsible for handling vectorization at import (or any CRUD operations) and when you run a search query. The second module communicates with the OpenAI completions endpoint.\n",
"\n",
"### No need to manually vectorize data\n",
"This is great news for you. With [text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai) you don't need to manually vectorize your data, as Weaviate will call OpenAI for you whenever necessary.\n",
"\n",
"All you need to do is:\n",
"1. provide your OpenAI API Key – when you connected to the Weaviate Client\n",
"2. define which OpenAI vectorizer to use in your Schema"
]
},
{
"cell_type": "markdown",
"id": "f1a618c5",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Before we start this project, we need setup the following:\n",
"\n",
"* create a `Weaviate` instance\n",
"* install libraries\n",
" * `weaviate-client`\n",
" * `datasets`\n",
" * `apache-beam`\n",
"* get your [OpenAI API key](https://beta.openai.com/account/api-keys)\n",
"To create a Weaviate instance we have 2 options:\n",
"\n",
"1. (Recommended path) [Weaviate Cloud Service](https://console.weaviate.io/) – to host your Weaviate instance in the cloud. The free sandbox should be more than enough for this cookbook.\n",
"2. Install and run Weaviate locally with Docker.\n",
"\n",
"#### Option 1 – WCS Installation Steps\n",
"\n",
"Use [Weaviate Cloud Service](https://console.weaviate.io/) (WCS) to create a free Weaviate cluster.\n",
"1. create a free account and/or login to [WCS](https://console.weaviate.io/)\n",
"2. create a `Weaviate Cluster` with the following settings:\n",
" * Sandbox: `Sandbox Free`\n",
" * Weaviate Version: Use default (latest)\n",
" * OIDC Authentication: `Disabled`\n",
"3. your instance should be ready in a minute or two\n",
"4. make a note of the `Cluster Id`. The link will take you to the full path of your cluster (you will need it later to connect to it). It should be something like: `https://your-project-name.weaviate.network` \n",
"\n",
"#### Option 2 – local Weaviate instance with Docker\n",
"\n",
"Install and run Weaviate locally with Docker.\n",
"1. Download the [./docker-compose.yml](./docker-compose.yml) file\n",
"3. Once this is ready, your instance should be available at [http://localhost:8080](http://localhost:8080)\n",
"\n",
"Note. To shut down your docker instance you can call: `docker-compose down`\n",
"\n",
"##### Learn more\n",
"To learn more, about using Weaviate with Docker see the [installation documentation](https://weaviate.io/developers/weaviate/installation/docker-compose)."
"Before running this project make sure to have the following libraries:\n",
"\n",
"### Weaviate Python client\n",
"\n",
"The [Weaviate Python client](https://weaviate.io/developers/weaviate/client-libraries/python) allows you to communicate with your Weaviate instance from your Python project.\n",
"\n",
"### datasets & apache-beam\n",
"\n",
"To load sample data, you need the `datasets` library and its' dependency `apache-beam`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b04113f",
"metadata": {},
"outputs": [],
"source": [
"# Install the Weaviate client for Python\n",
"!pip install weaviate-client>3.11.0\n",
"\n",
"# Install datasets and apache-beam to load the sample datasets\n",
" auth_client_secret=weaviate.auth.AuthApiKey(api_key=\"<YOUR-WEAVIATE-API-KEY>\"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)\n",
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."