"# Using Weaviate with OpenAI vectorize module for Embeddings Search\n",
"\n",
"This notebook is prepared for a scenario where:\n",
"* Your data is not vectorized\n",
"* You want to run Vector Search on your data\n",
"* You want to use Weaviate with the OpenAI module ([text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai)), to generate vector embeddings for you.\n",
"\n",
"This notebook takes you through a simple flow to set up a Weaviate instance, connect to it (with OpenAI API key), configure data schema, import data (which will automatically generate vector embeddings for your data), and run semantic search.\n",
"\n",
"This is a common requirement for customers who want to store and search our embeddings with their own data in a secure environment to support production use cases such as chatbots, topic modelling and more.\n",
"\n",
"## What is Weaviate\n",
"\n",
"Weaviate is an open-source vector search engine that stores data objects together with their vectors. This allows for combining vector search with structured filtering.\n",
"\n",
"Weaviate uses KNN algorithms to create an vector-optimized index, which allows your queries to run extremely fast. Learn more [here](https://weaviate.io/blog/why-is-vector-search-so-fast).\n",
"Additionally, Weaviate has a [REST layer](https://weaviate.io/developers/weaviate/api/rest/objects). Basically you can call Weaviate from any language that supports REST requests."
"Once you've run through this notebook you should have a basic understanding of how to setup and use vector databases, and can move on to more complex use cases making use of our embeddings."
"All Weaviate instances come equipped with the [text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai) module.\n",
"This is great news for you. With [text2vec-openai](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai) you don't need to manually vectorize your data, as Weaviate will call OpenAI for you whenever necessary.\n",
"\n",
"All you need to do is:\n",
"1. provide your OpenAI API Key – when you connected to the Weaviate Client\n",
"2. define which OpenAI vectorizer to use in your Schema"
]
},
{
"cell_type": "markdown",
"id": "f1a618c5",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Before we start this project, we need setup the following:\n",
"\n",
"* create a `Weaviate` instance\n",
"* install libraries\n",
" * `weaviate-client`\n",
" * `datasets`\n",
" * `apache-beam`\n",
"* get your [OpenAI API key](https://beta.openai.com/account/api-keys)\n",
"To create a Weaviate instance we have 2 options:\n",
"\n",
"1. (Recommended path) [Weaviate Cloud Service](https://console.weaviate.io/) – to host your Weaviate instance in the cloud. The free sandbox should be more than enough for this cookbook.\n",
"2. Install and run Weaviate locally with Docker.\n",
"\n",
"#### Option 1 – WCS Installation Steps\n",
"\n",
"Use [Weaviate Cloud Service](https://console.weaviate.io/) (WCS) to create a free Weaviate cluster.\n",
"1. create a free account and/or login to [WCS](https://console.weaviate.io/)\n",
"2. create a `Weaviate Cluster` with the following settings:\n",
" * Sandbox: `Sandbox Free`\n",
" * Weaviate Version: Use default (latest)\n",
" * OIDC Authentication: `Disabled`\n",
"3. your instance should be ready in a minute or two\n",
"4. make a note of the `Cluster Id`. The link will take you to the full path of your cluster (you will need it later to connect to it). It should be something like: `https://your-project-name.weaviate.network` \n",
"\n",
"#### Option 2 – local Weaviate instance with Docker\n",
"\n",
"Install and run Weaviate locally with Docker.\n",
"1. Download the [./docker-compose.yml](./docker-compose.yml) file\n",
"3. Once this is ready, your instance should be available at [http://localhost:8080](http://localhost:8080)\n",
"\n",
"Note. To shut down your docker instance you can call: `docker-compose down`\n",
"\n",
"##### Learn more\n",
"To learn more, about using Weaviate with Docker see the [installation documentation](https://weaviate.io/developers/weaviate/installation/docker-compose)."
"Before running this project make sure to have the following libraries:\n",
"\n",
"### Weaviate Python client\n",
"\n",
"The [Weaviate Python client](https://weaviate.io/developers/weaviate/client-libraries/python) allows you to communicate with your Weaviate instance from your Python project.\n",
" auth_client_secret=weaviate.auth.AuthApiKey(api_key=\"<YOUR-WEAVIATE-API-KEY>\"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)\n",
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."