"# Vector similarity search using Neon Postgres\n",
"\n",
"This notebook guides you through using [Neon Serverless Postgres](https://neon.tech/) as a vector database for OpenAI embeddings. It demonstrates how to:\n",
"\n",
"1. Use embeddings created by OpenAI API.\n",
"2. Store embeddings in a Neon Serverless Postgres database.\n",
"3. Convert a raw text query to an embedding with OpenAI API.\n",
"4. Use Neon with the `pgvector` extension to perform vector similarity search."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Before you begin, ensure that you have the following:\n",
"\n",
"1. A Neon Postgres database. You can create an account and set up a project with a ready-to-use `neondb` database in a few simple steps. For instructions, see [Sign up](https://neon.tech/docs/get-started-with-neon/signing-up) and [Create your first project](https://neon.tech/docs/get-started-with-neon/setting-up-a-project).\n",
"2. A connection string for your Neon database. You can copy it from the **Connection Details** widget on the Neon **Dashboard**. See [Connect from any application](https://neon.tech/docs/connect/connect-from-any-app).\n",
"3. The `pgvector` extension. Install the extension in Neon by running `CREATE EXTENSION vector;`. For instructions, see [Enable the pgvector extension](https://neon.tech/docs/extensions/pgvector#enable-the-pgvector-extension). \n",
"4. Your [OpenAI API key](https://platform.openai.com/account/api-keys).\n",
"5. Python and `pip`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install required modules\n",
"\n",
"This notebook requires the `openai`, `psycopg2`, `pandas`, `wget`, and `python-dotenv` packages. You can install them with `pip`:\n"
"An OpenAI API key is required to generate vectors for documents and queries.\n",
"\n",
"If you do not have an OpenAI API key, obtain one from https://platform.openai.com/account/api-keys.\n",
"\n",
"Add the OpenAI API key as an operating system environment variable or provide it for the session when prompted. If you define an environment variable, name the variable `OPENAI_API_KEY`.\n",
"\n",
"For information about configuring your OpenAI API key as an environment variable, refer to [Best Practices for API Key Safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test your OpenAPI key"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your OPENAI_API_KEY is ready\n"
]
}
],
"source": [
"# Test to ensure that your OpenAI API key is defined as an environment variable or provide it when prompted\n",
"# If you run this notebook locally, you may have to reload the terminal and the notebook to make the environment available\n",
"\n",
"import os\n",
"from getpass import getpass\n",
"\n",
"# Check if OPENAI_API_KEY is set as an environment variable\n",
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
" print(\"Your OPENAI_API_KEY is ready\")\n",
"else:\n",
" # If not, prompt for it\n",
" api_key = getpass(\"Enter your OPENAI_API_KEY: \")\n",
" if api_key:\n",
" print(\"Your OPENAI_API_KEY is now available for this session\")\n",
" # Optionally, you can set it as an environment variable for the current session\n",
" os.environ[\"OPENAI_API_KEY\"] = api_key\n",
" else:\n",
" print(\"You did not enter your OPENAI_API_KEY\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to your Neon database\n",
"\n",
"Provide your Neon database connection string below or define it in an `.env` file using a `DATABASE_URL` variable. For information about obtaining a Neon connection string, see [Connect from any application](https://neon.tech/docs/connect/connect-from-any-app)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import psycopg2\n",
"from dotenv import load_dotenv\n",
"\n",
"# Load environment variables from .env file\n",
"load_dotenv()\n",
"\n",
"# The connection string can be provided directly here.\n",
"# Replace the next line with Your Neon connection string.\n",
"# Execute this query to test the database connection\n",
"cursor.execute(\"SELECT 1;\")\n",
"result = cursor.fetchone()\n",
"\n",
"# Check the query result\n",
"if result == (1,):\n",
" print(\"Your database connection was successful!\")\n",
"else:\n",
" print(\"Your connection failed.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This guide uses pre-computed Wikipedia article embeddings available in the OpenAI Cookbook `examples` directory so that you do not have to compute embeddings with your own OpenAI credits. \n",
"CREATE INDEX ON public.articles USING ivfflat (content_vector) WITH (lists = 1000);\n",
"\n",
"CREATE INDEX ON public.articles USING ivfflat (title_vector) WITH (lists = 1000);\n",
"'''\n",
"\n",
"# Execute the SQL statements\n",
"cursor.execute(create_table_sql)\n",
"cursor.execute(create_indexes_sql)\n",
"\n",
"# Commit the changes\n",
"connection.commit()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load the data\n",
"\n",
"Load the pre-computed vector data into your `articles` table from the `.csv` file. There are 25000 records, so expect the operation to take several minutes."
"Start by defining the `query_neon` function, which is executed when you run the vector similarity search. The function creates an embedding based on the user's query, prepares the SQL query, and runs the SQL query with the embedding. The pre-computed embeddings that you loaded into your database were created with `text-embedding-3-small` OpenAI model, so you must use the same model to create an embedding for the similarity search.\n",