mirror of
https://github.com/james-m-jordan/openai-cookbook.git
synced 2025-05-09 19:32:38 +00:00
Merge pull request #321 from wangxuqi/add_analyticdb_vector_database_example
Add getting started with AnalyticDB distributed vector database example
This commit is contained in:
commit
4c31db4987
@ -0,0 +1,589 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Using AnalyticDB as a vector database for OpenAI embeddings\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook guides you step by step on using AnalyticDB as a vector database for OpenAI embeddings.\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook presents an end-to-end process of:\n",
|
||||||
|
"1. Using precomputed embeddings created by OpenAI API.\n",
|
||||||
|
"2. Storing the embeddings in a cloud instance of AnalyticDB.\n",
|
||||||
|
"3. Converting raw text query to an embedding with OpenAI API.\n",
|
||||||
|
"4. Using AnalyticDB to perform the nearest neighbour search in the created collection.\n",
|
||||||
|
"\n",
|
||||||
|
"### What is AnalyticDB\n",
|
||||||
|
"\n",
|
||||||
|
"[AnalyticDB](https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/product-introduction-overview) is a high-performance distributed vector database. Fully compatible with PostgresSQL syntax, you can effortlessly utilize it. AnalyticDB is Alibaba Cloud managed cloud-native database with strong-performed vector compute engine. Absolute out-of-box experience allow to scale into billions of data vectors processing with rich features including indexing algorithms, structured & non-structured data features, realtime update, distance metrics, scalar filtering, time travel searches etc. Also equipped with full OLAP database functionality and SLA commitment for production usage promise;\n",
|
||||||
|
"\n",
|
||||||
|
"### Deployment options\n",
|
||||||
|
"\n",
|
||||||
|
"- Using [AnalyticDB Cloud Vector Database](https://www.alibabacloud.com/help/zh/analyticdb-for-postgresql/latest/overview-2). [Click here](https://www.alibabacloud.com/product/hybriddb-postgresql) to fast deploy it.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"\n",
|
||||||
|
"For the purposes of this exercise we need to prepare a couple of things:\n",
|
||||||
|
"\n",
|
||||||
|
"1. AnalyticDB cloud server instance.\n",
|
||||||
|
"2. The 'psycopg2' library to interact with the vector database. Any other postgresql client library is ok.\n",
|
||||||
|
"3. An [OpenAI API key](https://beta.openai.com/account/api-keys).\n",
|
||||||
|
"\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We might validate if the server was launched successfully by running a simple curl command:\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Install requirements\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook obviously requires the `openai` and `psycopg2` packages, but there are also some other additional libraries we will use. The following command installs them all:\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:05:05.718972Z",
|
||||||
|
"start_time": "2023-02-16T12:04:30.434820Z"
|
||||||
|
},
|
||||||
|
"pycharm": {
|
||||||
|
"is_executing": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"! pip install openai psycopg2 pandas wget"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Prepare your OpenAI API key\n",
|
||||||
|
"\n",
|
||||||
|
"The OpenAI API key is used for vectorization of the documents and queries.\n",
|
||||||
|
"\n",
|
||||||
|
"If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
|
||||||
|
"\n",
|
||||||
|
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 6,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:05:05.730338Z",
|
||||||
|
"start_time": "2023-02-16T12:05:05.723351Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"OPENAI_API_KEY is ready\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# Test that your OpenAI API key is correctly set as an environment variable\n",
|
||||||
|
"# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n",
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"# Note. alternatively you can set a temporary env variable like this:\n",
|
||||||
|
"# os.environ[\"OPENAI_API_KEY\"] = \"sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n",
|
||||||
|
"\n",
|
||||||
|
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
|
||||||
|
" print(\"OPENAI_API_KEY is ready\")\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"OPENAI_API_KEY environment variable not found\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Connect to AnalyticDB\n",
|
||||||
|
"First add it to your environment variables. or you can just change the \"psycopg2.connect\" parameters below\n",
|
||||||
|
"\n",
|
||||||
|
"Connecting to a running instance of AnalyticDB server is easy with the official Python library:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:05:06.827143Z",
|
||||||
|
"start_time": "2023-02-16T12:05:05.733771Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"import psycopg2\n",
|
||||||
|
"\n",
|
||||||
|
"# Note. alternatively you can set a temporary env variable like this:\n",
|
||||||
|
"# os.environ[\"PGHOST\"] = \"your_host\"\n",
|
||||||
|
"# os.environ[\"PGPORT\"] \"5432\"),\n",
|
||||||
|
"# os.environ[\"PGDATABASE\"] \"postgres\"),\n",
|
||||||
|
"# os.environ[\"PGUSER\"] \"user\"),\n",
|
||||||
|
"# os.environ[\"PGPASSWORD\"] \"password\"),\n",
|
||||||
|
"\n",
|
||||||
|
"connection = psycopg2.connect(\n",
|
||||||
|
" host=os.environ.get(\"PGHOST\", \"localhost\"),\n",
|
||||||
|
" port=os.environ.get(\"PGPORT\", \"5432\"),\n",
|
||||||
|
" database=os.environ.get(\"PGDATABASE\", \"postgres\"),\n",
|
||||||
|
" user=os.environ.get(\"PGUSER\", \"user\"),\n",
|
||||||
|
" password=os.environ.get(\"PGPASSWORD\", \"password\")\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"# Create a new cursor object\n",
|
||||||
|
"cursor = connection.cursor()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We can test the connection by running any available method:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:05:06.848488Z",
|
||||||
|
"start_time": "2023-02-16T12:05:06.832612Z"
|
||||||
|
},
|
||||||
|
"pycharm": {
|
||||||
|
"is_executing": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Connection successful!\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"\n",
|
||||||
|
"# Execute a simple query to test the connection\n",
|
||||||
|
"cursor.execute(\"SELECT 1;\")\n",
|
||||||
|
"result = cursor.fetchone()\n",
|
||||||
|
"\n",
|
||||||
|
"# Check the query result\n",
|
||||||
|
"if result == (1,):\n",
|
||||||
|
" print(\"Connection successful!\")\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"Connection failed.\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 4,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:05:37.371951Z",
|
||||||
|
"start_time": "2023-02-16T12:05:06.851634Z"
|
||||||
|
},
|
||||||
|
"pycharm": {
|
||||||
|
"is_executing": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"100% [......................................................................] 698933052 / 698933052"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"'vector_database_wikipedia_articles_embedded.zip'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 4,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"import wget\n",
|
||||||
|
"\n",
|
||||||
|
"embeddings_url = \"https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip\"\n",
|
||||||
|
"\n",
|
||||||
|
"# The file is ~700 MB so this will take some time\n",
|
||||||
|
"wget.download(embeddings_url)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The downloaded file has to be then extracted:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 24,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:06:01.538851Z",
|
||||||
|
"start_time": "2023-02-16T12:05:37.376042Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"The file vector_database_wikipedia_articles_embedded.csv exists in the data directory.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"import zipfile\n",
|
||||||
|
"import os\n",
|
||||||
|
"import re\n",
|
||||||
|
"import tempfile\n",
|
||||||
|
"\n",
|
||||||
|
"current_directory = os.getcwd()\n",
|
||||||
|
"zip_file_path = os.path.join(current_directory, \"vector_database_wikipedia_articles_embedded.zip\")\n",
|
||||||
|
"output_directory = os.path.join(current_directory, \"../../data\")\n",
|
||||||
|
"\n",
|
||||||
|
"with zipfile.ZipFile(zip_file_path, \"r\") as zip_ref:\n",
|
||||||
|
" zip_ref.extractall(output_directory)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# check the csv file exist\n",
|
||||||
|
"file_name = \"vector_database_wikipedia_articles_embedded.csv\"\n",
|
||||||
|
"data_directory = os.path.join(current_directory, \"../../data\")\n",
|
||||||
|
"file_path = os.path.join(data_directory, file_name)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"if os.path.exists(file_path):\n",
|
||||||
|
" print(f\"The file {file_name} exists in the data directory.\")\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(f\"The file {file_name} does not exist in the data directory.\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Index data\n",
|
||||||
|
"\n",
|
||||||
|
"AnalyticDB stores data in __relation__ where each object is described by at least one vector. Our relation will be called **articles** and each object will be described by both **title** and **content** vectors. \\\n",
|
||||||
|
"\n",
|
||||||
|
"We will start with creating a relation and create a vector index on both **title** and **content**, and then we will fill it with our precomputed embeddings."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 21,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:17:36.366066Z",
|
||||||
|
"start_time": "2023-02-16T12:17:35.486872Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"create_table_sql = '''\n",
|
||||||
|
"CREATE TABLE IF NOT EXISTS public.articles (\n",
|
||||||
|
" id INTEGER NOT NULL,\n",
|
||||||
|
" url TEXT,\n",
|
||||||
|
" title TEXT,\n",
|
||||||
|
" content TEXT,\n",
|
||||||
|
" title_vector REAL[],\n",
|
||||||
|
" content_vector REAL[],\n",
|
||||||
|
" vector_id INTEGER\n",
|
||||||
|
");\n",
|
||||||
|
"\n",
|
||||||
|
"ALTER TABLE public.articles ADD PRIMARY KEY (id);\n",
|
||||||
|
"'''\n",
|
||||||
|
"\n",
|
||||||
|
"# SQL statement for creating indexes\n",
|
||||||
|
"create_indexes_sql = '''\n",
|
||||||
|
"CREATE INDEX ON public.articles USING ann (content_vector) WITH (distancemeasure = l2, dim = '1536', pq_segments = '64', hnsw_m = '100', pq_centers = '2048');\n",
|
||||||
|
"\n",
|
||||||
|
"CREATE INDEX ON public.articles USING ann (title_vector) WITH (distancemeasure = l2, dim = '1536', pq_segments = '64', hnsw_m = '100', pq_centers = '2048');\n",
|
||||||
|
"'''\n",
|
||||||
|
"\n",
|
||||||
|
"# Execute the SQL statements\n",
|
||||||
|
"cursor.execute(create_table_sql)\n",
|
||||||
|
"cursor.execute(create_indexes_sql)\n",
|
||||||
|
"\n",
|
||||||
|
"# Commit the changes\n",
|
||||||
|
"connection.commit()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Load data\n",
|
||||||
|
"\n",
|
||||||
|
"In this section we are going to load the data prepared previous to this session, so you don't have to recompute the embeddings of Wikipedia articles with your own credits."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:30:37.518210Z",
|
||||||
|
"start_time": "2023-02-16T12:17:36.368564Z"
|
||||||
|
},
|
||||||
|
"pycharm": {
|
||||||
|
"is_executing": true
|
||||||
|
},
|
||||||
|
"scrolled": false
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import io\n",
|
||||||
|
"\n",
|
||||||
|
"# Path to your local CSV file\n",
|
||||||
|
"csv_file_path = '../../data/vector_database_wikipedia_articles_embedded.csv'\n",
|
||||||
|
"\n",
|
||||||
|
"# Define a generator function to process the file line by line\n",
|
||||||
|
"def process_file(file_path):\n",
|
||||||
|
" with open(file_path, 'r') as file:\n",
|
||||||
|
" for line in file:\n",
|
||||||
|
" # Replace '[' with '{' and ']' with '}'\n",
|
||||||
|
" modified_line = line.replace('[', '{').replace(']', '}')\n",
|
||||||
|
" yield modified_line\n",
|
||||||
|
"\n",
|
||||||
|
"# Create a StringIO object to store the modified lines\n",
|
||||||
|
"modified_lines = io.StringIO(''.join(list(process_file(csv_file_path))))\n",
|
||||||
|
"\n",
|
||||||
|
"# Create the COPY command for the copy_expert method\n",
|
||||||
|
"copy_command = '''\n",
|
||||||
|
"COPY public.articles (id, url, title, content, title_vector, content_vector, vector_id)\n",
|
||||||
|
"FROM STDIN WITH (FORMAT CSV, HEADER true, DELIMITER ',');\n",
|
||||||
|
"'''\n",
|
||||||
|
"\n",
|
||||||
|
"# Execute the COPY command using the copy_expert method\n",
|
||||||
|
"cursor.copy_expert(copy_command, modified_lines)\n",
|
||||||
|
"\n",
|
||||||
|
"# Commit the changes\n",
|
||||||
|
"connection.commit()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:30:40.675202Z",
|
||||||
|
"start_time": "2023-02-16T12:30:40.655654Z"
|
||||||
|
},
|
||||||
|
"pycharm": {
|
||||||
|
"is_executing": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Count:25000\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# Check the collection size to make sure all the points have been stored\n",
|
||||||
|
"count_sql = \"\"\"select count(*) from public.articles;\"\"\"\n",
|
||||||
|
"cursor.execute(count_sql)\n",
|
||||||
|
"result = cursor.fetchone()\n",
|
||||||
|
"print(f\"Count:{result[0]}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Search data\n",
|
||||||
|
"\n",
|
||||||
|
"Once the data is put into Qdrant we will start querying the collection for the closest vectors. We may provide an additional parameter `vector_name` to switch from title to content based search. Since the precomputed embeddings were created with `text-embedding-ada-002` OpenAI model we also have to use it during search.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 14,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:30:38.024370Z",
|
||||||
|
"start_time": "2023-02-16T12:30:37.712816Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def query_analyticdb(query, collection_name, vector_name=\"title_vector\", top_k=20):\n",
|
||||||
|
"\n",
|
||||||
|
" # Creates embedding vector from user query\n",
|
||||||
|
" embedded_query = openai.Embedding.create(\n",
|
||||||
|
" input=query,\n",
|
||||||
|
" model=\"text-embedding-ada-002\",\n",
|
||||||
|
" )[\"data\"][0][\"embedding\"]\n",
|
||||||
|
"\n",
|
||||||
|
" # Convert the embedded_query to PostgreSQL compatible format\n",
|
||||||
|
" embedded_query_pg = \"{\" + \",\".join(map(str, embedded_query)) + \"}\"\n",
|
||||||
|
"\n",
|
||||||
|
" # Create SQL query\n",
|
||||||
|
" query_sql = f\"\"\"\n",
|
||||||
|
" SELECT id, url, title, l2_distance({vector_name},'{embedded_query_pg}'::real[]) AS similarity\n",
|
||||||
|
" FROM {collection_name}\n",
|
||||||
|
" ORDER BY {vector_name} <-> '{embedded_query_pg}'::real[]\n",
|
||||||
|
" LIMIT {top_k};\n",
|
||||||
|
" \"\"\"\n",
|
||||||
|
" # Execute the query\n",
|
||||||
|
" cursor.execute(query_sql)\n",
|
||||||
|
" results = cursor.fetchall()\n",
|
||||||
|
"\n",
|
||||||
|
" return results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 15,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:30:39.379566Z",
|
||||||
|
"start_time": "2023-02-16T12:30:38.031041Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"1. Museum of Modern Art (Score: 0.75)\n",
|
||||||
|
"2. Western Europe (Score: 0.735)\n",
|
||||||
|
"3. Renaissance art (Score: 0.728)\n",
|
||||||
|
"4. Pop art (Score: 0.721)\n",
|
||||||
|
"5. Northern Europe (Score: 0.71)\n",
|
||||||
|
"6. Hellenistic art (Score: 0.706)\n",
|
||||||
|
"7. Modernist literature (Score: 0.694)\n",
|
||||||
|
"8. Art film (Score: 0.687)\n",
|
||||||
|
"9. Central Europe (Score: 0.685)\n",
|
||||||
|
"10. European (Score: 0.683)\n",
|
||||||
|
"11. Art (Score: 0.683)\n",
|
||||||
|
"12. Byzantine art (Score: 0.682)\n",
|
||||||
|
"13. Postmodernism (Score: 0.68)\n",
|
||||||
|
"14. Eastern Europe (Score: 0.679)\n",
|
||||||
|
"15. Europe (Score: 0.678)\n",
|
||||||
|
"16. Cubism (Score: 0.678)\n",
|
||||||
|
"17. Impressionism (Score: 0.677)\n",
|
||||||
|
"18. Bauhaus (Score: 0.676)\n",
|
||||||
|
"19. Surrealism (Score: 0.674)\n",
|
||||||
|
"20. Expressionism (Score: 0.674)\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"import openai\n",
|
||||||
|
"\n",
|
||||||
|
"query_results = query_analyticdb(\"modern art in Europe\", \"Articles\")\n",
|
||||||
|
"for i, result in enumerate(query_results):\n",
|
||||||
|
" print(f\"{i + 1}. {result[2]} (Score: {round(1 - result[3], 3)})\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 16,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-02-16T12:30:40.652676Z",
|
||||||
|
"start_time": "2023-02-16T12:30:39.382555Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"1. Battle of Bannockburn (Score: 0.739)\n",
|
||||||
|
"2. Wars of Scottish Independence (Score: 0.723)\n",
|
||||||
|
"3. 1651 (Score: 0.705)\n",
|
||||||
|
"4. First War of Scottish Independence (Score: 0.699)\n",
|
||||||
|
"5. Robert I of Scotland (Score: 0.692)\n",
|
||||||
|
"6. 841 (Score: 0.688)\n",
|
||||||
|
"7. 1716 (Score: 0.688)\n",
|
||||||
|
"8. 1314 (Score: 0.674)\n",
|
||||||
|
"9. 1263 (Score: 0.673)\n",
|
||||||
|
"10. William Wallace (Score: 0.671)\n",
|
||||||
|
"11. Stirling (Score: 0.663)\n",
|
||||||
|
"12. 1306 (Score: 0.662)\n",
|
||||||
|
"13. 1746 (Score: 0.661)\n",
|
||||||
|
"14. 1040s (Score: 0.656)\n",
|
||||||
|
"15. 1106 (Score: 0.654)\n",
|
||||||
|
"16. 1304 (Score: 0.653)\n",
|
||||||
|
"17. David II of Scotland (Score: 0.65)\n",
|
||||||
|
"18. Braveheart (Score: 0.649)\n",
|
||||||
|
"19. 1124 (Score: 0.648)\n",
|
||||||
|
"20. July 27 (Score: 0.646)\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# This time we'll query using content vector\n",
|
||||||
|
"query_results = query_analyticdb(\"Famous battles in Scottish history\", \"Articles\", \"content_vector\")\n",
|
||||||
|
"for i, result in enumerate(query_results):\n",
|
||||||
|
" print(f\"{i + 1}. {result[2]} (Score: {round(1 - result[3], 3)})\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.10.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 1
|
||||||
|
}
|
Loading…
x
Reference in New Issue
Block a user