Pushing updates to batch embeddings and clean up evaluation section

2025-05-09 19:32:38 +00:00 · 2023-05-12 12:53:06 +01:00 · 2023-05-12 12:53:06 +01:00 · 5d9e877909
commit 5d9e877909
parent a3918a9d60
6 changed files with 1916 additions and 60840 deletions
--- a/apps/enterprise-knowledge-retrieval/README.md
+++ b/apps/enterprise-knowledge-retrieval/README.md
@ -1,9 +1,9 @@
 # Enterprise Knowledge Retrieval

-This repo is a deep dive on Enterprise Knowledge Retrieval, which aims to take some unstructured text documents and create a usable knowledge base application with it.
+This app is a deep dive on Enterprise Knowledge Retrieval, which aims to take some unstructured text documents and create a usable knowledge base application with it.

 This repo contains a notebook and a basic Streamlit app:
- `enterprise_knowledge_retrieval.ipynb`: A notebook containing a step by step process of tokenising, chunking and embedding your data in a vector database, building a chat agent on top and running a basic evaluation of its performance
+- `enterprise_knowledge_retrieval.ipynb`: A notebook containing a step by step process of tokenising, chunking and embedding your data in a vector database, building a chat agent on top and running a basic evaluation of its performance.
 - `chatbot.py`: A Streamlit app providing simple Q&A via a search bar to query your knowledge base.

 To run the app, please follow the instructions below in the ```App``` section
@ -21,16 +21,16 @@ Once you've run the notebook through to the Search stage, you should have what y

 ## App

-We've rolled in a basic Streamlit app that you can interact with to test your retrieval service using either standard semantic search or Hyde retrievals.
+We've rolled in a basic Streamlit app that you can interact with to test your retrieval service using either standard semantic search or [HyDE](https://arxiv.org/abs/2212.10496) retrievals.

-You can use it by:
- Ensuring you followed the Setup and Storage steps from the notebook to populate a vector database with searchable content.
- Setting up a virtual environment with pip by running ```virtualenv venv``` (ensure ```virtualenv``` is installed).
+To use it:
+- Ensure you followed the Setup and Storage steps from the notebook to populate a vector database with searchable content.
+- Set up a virtual environment with pip by running ```virtualenv venv``` (ensure ```virtualenv``` is installed).
 - Activate the environment by running ```source venv/bin/activate```.
 - Install requirements by running ```pip install -r requirements.txt```.
- Run ```streamlit run chatbot.py``` to fire up the Streamlit app in your browser
+- Run ```streamlit run chatbot.py``` to fire up the Streamlit app in your browser.

 ## Limitations

 - This app uses Redis as a vector database, but there are many other options highlighted `../examples/vector_databases` depending on your need.
- We introduce many areas you may optimize in the notebook, but we'll deep dive on these in separate offerings in the coming weeks.
+- We introduce many areas you may optimize in the notebook, but we'll deep dive on these in subsequent cookbooks.
--- a/apps/enterprise-knowledge-retrieval/assistant.py
+++ b/apps/enterprise-knowledge-retrieval/assistant.py
@ -167,3 +167,17 @@ def initiate_agent(tools):
    )

    return agent_executor
+
+
+def ask_gpt(query):
+    response = openai.ChatCompletion.create(
+        model=CHAT_MODEL,
+        messages=[
+            {
+                "role": "user",
+                "content": "Please answer my question.\nQuestion: {}".format(query),
+            }
+        ],
+        temperature=0,
+    )
+    return response["choices"][0]["message"]["content"]
--- a/apps/enterprise-knowledge-retrieval/chatbot.py
+++ b/apps/enterprise-knowledge-retrieval/chatbot.py
@ -4,7 +4,12 @@ import streamlit as st
 from streamlit_chat import message

 from database import get_redis_connection
-from assistant import answer_user_question, initiate_agent, answer_question_hyde
+from assistant import (
+    answer_user_question,
+    initiate_agent,
+    answer_question_hyde,
+    ask_gpt,
+)

 # Initialise database

@ -36,7 +41,12 @@ tools = [
        if add_selectbox == "Standard vector search"
        else answer_question_hyde,
        description="Useful for when you need to answer general knowledge questions. Input should be a fully formed question.",
-    )
+    ),
+    Tool(
+        name="Ask",
+        func=ask_gpt,
+        description="Useful if the question is not general knowledge. Input should be a fully formed question.",
+    ),
 ]

 if "generated" not in st.session_state:
--- a/apps/enterprise-knowledge-retrieval/config.py
+++ b/apps/enterprise-knowledge-retrieval/config.py
@ -1,5 +1,5 @@
 REDIS_HOST = "localhost"
-REDIS_PORT = "6380"
+REDIS_PORT = "6379"
 REDIS_DB = "0"
 INDEX_NAME = "wiki-index"
 VECTOR_FIELD_NAME = "content_vector"
--- a/apps/enterprise-knowledge-retrieval/data/wikipedia_articles_2000.csv
+++ b/apps/enterprise-knowledge-retrieval/data/wikipedia_articles_2000.csv
--- a/apps/enterprise-knowledge-retrieval/enterprise_knowledge_retrieval.ipynb
+++ b/apps/enterprise-knowledge-retrieval/enterprise_knowledge_retrieval.ipynb