"# Question Answering in Weaviate with OpenAI Q&A module\n",
"\n",
"This notebook is prepared for a scenario where:\n",
"* Your data is not vectorized\n",
"* You want to run Q&A ([learn more](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/qna-openai)) on your data based on the [OpenAI completions](https://beta.openai.com/docs/api-reference/completions) endpoint.\n",
"* You want to use Weaviate with the OpenAI module ([text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai)), to generate vector embeddings for you.\n",
"\n",
"This notebook takes you through a simple flow to set up a Weaviate instance, connect to it (with OpenAI API key), configure data schema, import data (which will automatically generate vector embeddings for your data), and run question answering.\n",
"\n",
"## What is Weaviate\n",
"\n",
"Weaviate is an open-source vector search engine that stores data objects together with their vectors. This allows for combining vector search with structured filtering.\n",
"\n",
"Weaviate uses KNN algorithms to create an vector-optimized index, which allows your queries to run extremely fast. Learn more [here](https://weaviate.io/blog/why-is-vector-search-so-fast).\n",
"Additionally, Weaviate has a [REST layer](https://weaviate.io/developers/weaviate/api/rest/objects). Basically you can call Weaviate from any language that supports REST requests."
"All Weaviate instances come equipped with the [text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai) and the [qna-openai](https://weaviate.io/developers/weaviate/modules/reader-generator-modules/qna-openai) modules.\n",
"The first module is responsible for handling vectorization at import (or any CRUD operations) and when you run a search query. The second module communicates with the OpenAI completions endpoint.\n",
"\n",
"### No need to manually vectorize data\n",
"This is great news for you. With [text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai) you don't need to manually vectorize your data, as Weaviate will call OpenAI for you whenever necessary.\n",
"\n",
"All you need to do is:\n",
"1. provide your OpenAI API Key – when you connected to the Weaviate Client\n",
"2. define which OpenAI vectorizer to use in your Schema"
]
},
{
"cell_type": "markdown",
"id": "f1a618c5",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Before we start this project, we need setup the following:\n",
"\n",
"* create a `Weaviate` instance\n",
"* install libraries\n",
" * `weaviate-client`\n",
" * `datasets`\n",
" * `apache-beam`\n",
"* get your [OpenAI API key](https://beta.openai.com/account/api-keys)\n",
"To create a Weaviate instance we have 2 options:\n",
"\n",
"1. (Recommended path) [Weaviate Cloud Service](https://console.weaviate.io/) – to host your Weaviate instance in the cloud. The free sandbox should be more than enough for this cookbook.\n",
"2. Install and run Weaviate locally with Docker.\n",
"\n",
"#### Option 1 – WCS Installation Steps\n",
"\n",
"Use [Weaviate Cloud Service](https://console.weaviate.io/) (WCS) to create a free Weaviate cluster.\n",
"1. create a free account and/or login to [WCS](https://console.weaviate.io/)\n",
"2. create a `Weaviate Cluster` with the following settings:\n",
" * Sandbox: `Sandbox Free`\n",
" * Weaviate Version: Use default (latest)\n",
" * OIDC Authentication: `Disabled`\n",
"3. your instance should be ready in a minute or two\n",
"4. make a note of the `Cluster Id`. The link will take you to the full path of your cluster (you will need it later to connect to it). It should be something like: `https://your-project-name.weaviate.network` \n",
"\n",
"#### Option 2 – local Weaviate instance with Docker\n",
"\n",
"Install and run Weaviate locally with Docker.\n",
"1. Download the [./docker-compose.yml](./docker-compose.yml) file\n",
"3. Once this is ready, your instance should be available at [http://localhost:8080](http://localhost:8080)\n",
"\n",
"Note. To shut down your docker instance you can call: `docker-compose down`\n",
"\n",
"##### Learn more\n",
"To learn more, about using Weaviate with Docker see the [installation documentation](https://weaviate.io/developers/weaviate/installation/docker-compose)."
"Before running this project make sure to have the following libraries:\n",
"\n",
"### Weaviate Python client\n",
"\n",
"The [Weaviate Python client](https://weaviate.io/developers/weaviate/client-libraries/python) allows you to communicate with your Weaviate instance from your Python project.\n",
"\n",
"### datasets & apache-beam\n",
"\n",
"To load sample data, you need the `datasets` library and its' dependency `apache-beam`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b04113f",
"metadata": {},
"outputs": [],
"source": [
"# Install the Weaviate client for Python\n",
"!pip install weaviate-client>3.11.0\n",
"\n",
"# Install datasets and apache-beam to load the sample datasets\n",
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."