diff --git a/examples/third_party_examples/Visualizing_embeddings_in_Kangas.ipynb b/examples/third_party_examples/Visualizing_embeddings_in_Kangas.ipynb
new file mode 100644
index 0000000..bc0c377
--- /dev/null
+++ b/examples/third_party_examples/Visualizing_embeddings_in_Kangas.ipynb
@@ -0,0 +1,439 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0wjP9mrldJsd"
+ },
+ "source": [
+ "## Visualizing the embeddings in Kangas\n",
+ "\n",
+ "In this Jupyter Notebook, we construct a Kangas DataGrid containing the data and projections of the embeddings into 2 dimensions."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "4tPKQqqldJsj"
+ },
+ "source": [
+ "## What is Kangas?\n",
+ "\n",
+ "[Kangas](https://github.com/comet-ml/kangas/) as an open source, mixed-media, dataframe-like tool for data scientists. It was developed by [Comet](https://comet.com/), a company designed to help reduce the friction of moving models into production. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6sNsB2iFdJsk"
+ },
+ "source": [
+ "### 1. Setup\n",
+ "\n",
+ "To get started, we pip install kangas, and import it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "N8gi529adL-f",
+ "outputId": "c12e9973-a179-41e3-c5a8-f241804d99ad"
+ },
+ "outputs": [],
+ "source": [
+ "%pip install kangas --quiet"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "htxjXThodRxD"
+ },
+ "outputs": [],
+ "source": [
+ "import kangas as kg"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2. Constructing a Kangas DataGrid\n",
+ "\n",
+ "We create a Kangas Datagrid with the original data and the embeddings. The data is composed of a rows of reviews, and the embeddings are composed of 1536 floating-point values. In this example, we get the data directly from github, in case you aren't running this notebook inside OpenAI's repo.\n",
+ "\n",
+ "We use Kangas to read the CSV file into a DataGrid for further processing."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "0SxWlRTrdVJq",
+ "outputId": "d36c3a14-2e80-4315-e285-f39f6b008976"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Loading CSV file 'fine_food_reviews_with_embeddings_1k.csv'...\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "1001it [00:00, 2412.90it/s]\n",
+ "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 2899.16it/s]\n"
+ ]
+ }
+ ],
+ "source": [
+ "data = kg.read_csv(\"https://raw.githubusercontent.com/openai/openai-cookbook/main/examples/data/fine_food_reviews_with_embeddings_1k.csv\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can review the fields of the CSV file:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "bzhQgoRGeMCp",
+ "outputId": "791c4e40-fb28-409e-d1e9-20b753fb1215"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "DataGrid (in memory)\n",
+ " Name : fine_food_reviews_with_embeddings_1k\n",
+ " Rows : 1,000\n",
+ " Columns: 9\n",
+ "# Column Non-Null Count DataGrid Type \n",
+ "--- -------------------- --------------- --------------------\n",
+ "1 Column 1 1,000 INTEGER \n",
+ "2 ProductId 1,000 TEXT \n",
+ "3 UserId 1,000 TEXT \n",
+ "4 Score 1,000 INTEGER \n",
+ "5 Summary 1,000 TEXT \n",
+ "6 Text 1,000 TEXT \n",
+ "7 combined 1,000 TEXT \n",
+ "8 n_tokens 1,000 INTEGER \n",
+ "9 embedding 1,000 TEXT \n"
+ ]
+ }
+ ],
+ "source": [
+ "data.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "And get a glimpse of the first and last rows:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 349
+ },
+ "id": "Q95N832aeaBr",
+ "outputId": "aaea2816-e5a1-4e52-f228-c3e6aca6fa3e"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
row-id | Column 1 | ProductId | UserId | Score | Summary | Text | combined | n_tokens | embedding | \n",
+ " 1 | 0 | B003XPF9BO | A3R7JR3FMEBXQB | 5 | where does one | Wanted to save | Title: where do | 52 | [0.007018072064 |
\n",
+ " 2 | 297 | B003VXHGPK | A21VWSCGW7UUAR | 4 | Good, but not W | Honestly, I hav | Title: Good, bu | 178 | [-0.00314055196 |
\n",
+ " 3 | 296 | B008JKTTUA | A34XBAIFT02B60 | 1 | Should advertis | First, these sh | Title: Should a | 78 | [-0.01757248118 |
\n",
+ " 4 | 295 | B000LKTTTW | A14MQ40CCU8B13 | 5 | Best tomato sou | I have a hard t | Title: Best tom | 111 | [-0.00139322795 |
\n",
+ " 5 | 294 | B001D09KAM | A34XBAIFT02B60 | 1 | Should advertis | First, these sh | Title: Should a | 78 | [-0.01757248118 |
\n",
+ "
... |
996 | 623 | B0000CFXYA | A3GS4GWPIBV0NT | 1 | Strange inflamm | Truthfully wasn | Title: Strange | 110 | [0.000110913533 | \n",
+ " 997 | 624 | B0001BH5YM | A1BZ3HMAKK0NC | 5 | My favorite and | You've just got | Title: My favor | 80 | [-0.02086931467 |
\n",
+ " 998 | 625 | B0009ET7TC | A2FSDQY5AI6TNX | 5 | My furbabies LO | Shake the conta | Title: My furba | 47 | [-0.00974910240 |
\n",
+ " 999 | 619 | B007PA32L2 | A15FF2P7RPKH6G | 5 | got this for th | all i have hear | Title: got this | 50 | [-0.00521062919 |
\n",
+ " 1000 | 999 | B001EQ5GEO | A3VYU0VO6DYV6I | 5 | I love Maui Cof | My first experi | Title: I love M | 118 | [-0.00605782261 |
\n",
+ "
\n",
+ " [1000 rows x 9 columns] |
\n",
+ "
|
* Use DataGrid.save() to save to disk |
** Use DataGrid.show() to start user interface |
"
+ ],
+ "text/plain": [
+ " row-id | Column 1 | ProductId | UserId | Score | Summary | Text | combined | n_tokens | embedding | \n",
+ " 1 | 0 | B003XPF9BO | A3R7JR3FMEBXQB | 5 | where does one | Wanted to save | Title: where do | 52 | [0.007018072064 |
\n",
+ " 2 | 297 | B003VXHGPK | A21VWSCGW7UUAR | 4 | Good, but not W | Honestly, I hav | Title: Good, bu | 178 | [-0.00314055196 |
\n",
+ " 3 | 296 | B008JKTTUA | A34XBAIFT02B60 | 1 | Should advertis | First, these sh | Title: Should a | 78 | [-0.01757248118 |
\n",
+ " 4 | 295 | B000LKTTTW | A14MQ40CCU8B13 | 5 | Best tomato sou | I have a hard t | Title: Best tom | 111 | [-0.00139322795 |
\n",
+ " 5 | 294 | B001D09KAM | A34XBAIFT02B60 | 1 | Should advertis | First, these sh | Title: Should a | 78 | [-0.01757248118 |
\n",
+ "...\n",
+ " 996 | 623 | B0000CFXYA | A3GS4GWPIBV0NT | 1 | Strange inflamm | Truthfully wasn | Title: Strange | 110 | [0.000110913533 |
\n",
+ " 997 | 624 | B0001BH5YM | A1BZ3HMAKK0NC | 5 | My favorite and | You've just got | Title: My favor | 80 | [-0.02086931467 |
\n",
+ " 998 | 625 | B0009ET7TC | A2FSDQY5AI6TNX | 5 | My furbabies LO | Shake the conta | Title: My furba | 47 | [-0.00974910240 |
\n",
+ " 999 | 619 | B007PA32L2 | A15FF2P7RPKH6G | 5 | got this for th | all i have hear | Title: got this | 50 | [-0.00521062919 |
\n",
+ " 1000 | 999 | B001EQ5GEO | A3VYU0VO6DYV6I | 5 | I love Maui Cof | My first experi | Title: I love M | 118 | [-0.00605782261 |
\n",
+ "
\n",
+ " [1000 rows x 9 columns] |
\n",
+ "\n",
+ "* Use DataGrid.save() to save to disk\n",
+ "** Use DataGrid.show() to start user interface"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now, we create a new DataGrid, converting the numbers into an Embedding:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "id": "Bu0erP68dvLU"
+ },
+ "outputs": [],
+ "source": [
+ "import ast # to convert string of a list of numbers into a list of numbers\n",
+ "\n",
+ "dg = kg.DataGrid(\n",
+ " name=\"openai_embeddings\",\n",
+ " columns=data.get_columns(),\n",
+ " converters={\"Score\": str},\n",
+ ")\n",
+ "for row in data:\n",
+ " embedding = ast.literal_eval(row[8])\n",
+ " row[8] = kg.Embedding(\n",
+ " embedding, \n",
+ " name=str(row[3]), \n",
+ " text=\"%s - %.10s\" % (row[3], row[4]),\n",
+ " projection=\"umap\",\n",
+ " )\n",
+ " dg.append(row)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The new DataGrid now has an Embedding column with proper datatype."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "gd6Od4Bmhijy",
+ "outputId": "9aa38221-0272-4a63-e393-706e0a0c5879"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "DataGrid (in memory)\n",
+ " Name : openai_embeddings\n",
+ " Rows : 1,000\n",
+ " Columns: 9\n",
+ "# Column Non-Null Count DataGrid Type \n",
+ "--- -------------------- --------------- --------------------\n",
+ "1 Column 1 1,000 INTEGER \n",
+ "2 ProductId 1,000 TEXT \n",
+ "3 UserId 1,000 TEXT \n",
+ "4 Score 1,000 TEXT \n",
+ "5 Summary 1,000 TEXT \n",
+ "6 Text 1,000 TEXT \n",
+ "7 combined 1,000 TEXT \n",
+ "8 n_tokens 1,000 INTEGER \n",
+ "9 embedding 1,000 EMBEDDING-ASSET \n"
+ ]
+ }
+ ],
+ "source": [
+ "dg.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We simply save the datagrid, and we're done."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dg.save()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 3. Render 2D Projections\n",
+ "\n",
+ "To render the data directly in the notebook, simply show it. Note that each row contains an embedding projection. \n",
+ "\n",
+ "Scroll to far right to see embeddings projection per row.\n",
+ "\n",
+ "The color of the point in projection space represents the Score."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 771
+ },
+ "id": "Z8j-GdpiijU0",
+ "outputId": "20a0b1ca-3059-4384-cd8c-b32b1aa1c270"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dg.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Group by \"Score\" to see rows of each group."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "dg.show(group=\"Score\", sort=\"Score\", rows=5, select=\"Score,embedding\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vLIxfmK5dJsq"
+ },
+ "source": [
+ "An example of this datagrid is hosted here: https://kangas.comet.com/?datagrid=/data/openai_embeddings.datagrid"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "TPU",
+ "colab": {
+ "gpuType": "V100",
+ "machine_shape": "hm",
+ "provenance": []
+ },
+ "gpuClass": "standard",
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.11"
+ },
+ "vscode": {
+ "interpreter": {
+ "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}