{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "cb1537e6", "metadata": {}, "source": [ "# Using Redis as a Vector Database with OpenAI\n", "\n", "This notebook provides an introduction to using Redis as a vector database with OpenAI embeddings. Redis is a scalable, real-time database that can be used as a vector database when using the [RediSearch Module](https://oss.redislabs.com/redisearch/). The RediSearch module allows you to index and search for vectors in Redis. This notebook will show you how to use the RediSearch module to index and search for vectors created by using the OpenAI API and stored in Redis.\n", "\n", "### What is Redis?\n", "\n", "Most developers from a web services background are probably familiar with Redis. At it's core, Redis is an open-source key-value store that can be used as a cache, message broker, and database. Developers choice Redis because it is fast, has a large ecosystem of client libraries, and has been deployed by major enterprises for years.\n", "\n", "In addition to the traditional uses of Redis. Redis also provides [Redis Modules](https://redis.io/modules) which are a way to extend Redis with new data types and commands. Example modules include [RedisJSON](https://redis.io/docs/stack/json/), [RedisTimeSeries](https://redis.io/docs/stack/timeseries/), [RedisBloom](https://redis.io/docs/stack/bloom/) and [RediSearch](https://redis.io/docs/stack/search/).\n", "\n", "### What is RediSearch?\n", "\n", "RediSearch is a [Redis module](https://redis.io/modules) that provides querying, secondary indexing, full-text search and vector search for Redis. To use RediSearch, you first declare indexes on your Redis data. You can then use the RediSearch clients to query that data. For more information on the feature set of RediSearch, see the [README](./README.md) or the [RediSearch documentation](https://redis.io/docs/stack/search/).\n", "\n", "### Deployment options\n", "\n", "There are a number of ways to deploy Redis. For local development, the quickest method is to use the [Redis Stack docker container](https://hub.docker.com/r/redis/redis-stack) which we will use here. Redis Stack contains a number of Redis modules that can be used together to create a fast, multi-model data store and query engine.\n", "\n", "For production use cases, The easiest way to get started is to use the [Redis Cloud](https://redislabs.com/redis-enterprise-cloud/overview/) service. Redis Cloud is a fully managed Redis service. You can also deploy Redis on your own infrastructure using [Redis Enterprise](https://redislabs.com/redis-enterprise/overview/). Redis Enterprise is a fully managed Redis service that can be deployed in kubernetes, on-premises or in the cloud.\n", "\n", "Additionally, every major cloud provider ([AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-e6y7ork67pjwg?sr=0-2&ref_=beagle&applicationId=AWSMPContessa), [Google Marketplace](https://console.cloud.google.com/marketplace/details/redislabs-public/redis-enterprise?pli=1), or [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/garantiadata.redis_enterprise_1sp_public_preview?tab=Overview)) offers Redis Enterprise in a marketplace offering.\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f1a618c5", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "Before we start this project, we need to set up the following:\n", "\n", "* start a Redis database with RediSearch (redis-stack)\n", "* install libraries\n", " * [Redis-py](https://github.com/redis/redis-py)\n", "* get your [OpenAI API key](https://beta.openai.com/account/api-keys)\n", "\n", "===========================================================\n", "\n", "### Start Redis\n", "\n", "To keep this example simple, we will use the Redis Stack docker container which we can start as follows\n", "\n", "```bash\n", "$ docker-compose up -d\n", "```\n", "\n", "This also includes the [RedisInsight](https://redis.com/redis-enterprise/redis-insight/) GUI for managing your Redis database which you can view at [http://localhost:8001](http://localhost:8001) once you start the docker container.\n", "\n", "You're all set up and ready to go! Next, we import and create our client for communicating with the Redis database we just created." ] }, { "attachments": {}, "cell_type": "markdown", "id": "b9babafe", "metadata": {}, "source": [ "## Install Requirements\n", "\n", "Redis-Py is the python client for communicating with Redis. We will use this to communicate with our Redis-stack database. " ] }, { "cell_type": "code", "execution_count": null, "id": "2b04113f", "metadata": {}, "outputs": [], "source": [ "! pip install redis wget pandas openai" ] }, { "attachments": {}, "cell_type": "markdown", "id": "36fe86f4", "metadata": {}, "source": [ "===========================================================\n", "## Prepare your OpenAI API key\n", "\n", "The `OpenAI API key` is used for vectorization of query data.\n", "\n", "If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n", "\n", "Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`." ] }, { "cell_type": "code", "execution_count": 2, "id": "88be138c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OPENAI_API_KEY is ready\n" ] } ], "source": [ "# Test that your OpenAI API key is correctly set as an environment variable\n", "# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n", "import os\n", "import openai\n", "\n", "# Note. alternatively you can set a temporary env variable like this:\n", "# os.environ[\"OPENAI_API_KEY\"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n", "\n", "if os.getenv(\"OPENAI_API_KEY\") is not None:\n", " openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n", " print (\"OPENAI_API_KEY is ready\")\n", "else:\n", " print (\"OPENAI_API_KEY environment variable not found\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "97fefe4c", "metadata": {}, "source": [ "## Load data\n", "\n", "In this section we'll load embedded data that has already been converted into vectors. We'll use this data to create an index in Redis and then search for similar vectors." ] }, { "cell_type": "code", "execution_count": 3, "id": "9fbebe0d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File Downloaded\n" ] }, { "data": { "text/html": [ "
\n", " | id | \n", "url | \n", "title | \n", "text | \n", "title_vector | \n", "content_vector | \n", "vector_id | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "https://simple.wikipedia.org/wiki/April | \n", "April | \n", "April is the fourth month of the year in the J... | \n", "[0.001009464613161981, -0.020700545981526375, ... | \n", "[-0.011253940872848034, -0.013491976074874401,... | \n", "0 | \n", "
1 | \n", "2 | \n", "https://simple.wikipedia.org/wiki/August | \n", "August | \n", "August (Aug.) is the eighth month of the year ... | \n", "[0.0009286514250561595, 0.000820168002974242, ... | \n", "[0.0003609954728744924, 0.007262262050062418, ... | \n", "1 | \n", "
2 | \n", "6 | \n", "https://simple.wikipedia.org/wiki/Art | \n", "Art | \n", "Art is a creative activity that expresses imag... | \n", "[0.003393713850528002, 0.0061537534929811954, ... | \n", "[-0.004959689453244209, 0.015772193670272827, ... | \n", "2 | \n", "
3 | \n", "8 | \n", "https://simple.wikipedia.org/wiki/A | \n", "A | \n", "A or a is the first letter of the English alph... | \n", "[0.0153952119871974, -0.013759135268628597, 0.... | \n", "[0.024894846603274345, -0.022186409682035446, ... | \n", "3 | \n", "
4 | \n", "9 | \n", "https://simple.wikipedia.org/wiki/Air | \n", "Air | \n", "Air refers to the Earth's atmosphere. Air is a... | \n", "[0.02224554680287838, -0.02044147066771984, -0... | \n", "[0.021524671465158463, 0.018522677943110466, -... | \n", "4 | \n", "