mirror of
https://github.com/james-m-jordan/openai-cookbook.git
synced 2025-05-09 19:32:38 +00:00
1649 lines
62 KiB
Plaintext
1649 lines
62 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Multiclass Classification for Transactions\n",
|
|
"\n",
|
|
"For this notebook we will be looking to classify a public dataset of transactions into a number of categories that we have predefined. These approaches should be replicable to any multiclass classification use case where we are trying to fit transactional data into predefined categories, and by the end of running through this you should have a few approaches for dealing with both labelled and unlabelled datasets.\n",
|
|
"\n",
|
|
"The different approaches we'll be taking in this notebook are:\n",
|
|
"- **Zero-shot Classification:** First we'll do zero shot classification to put transactions in one of five named buckets using only a prompt for guidance\n",
|
|
"- **Classification with Embeddings:** Following this we'll create embeddings on a labelled dataset, and then use a traditional classification model to test their effectiveness at identifying our categories\n",
|
|
"- **Fine-tuned Classification:** Lastly we'll produce a fine-tuned model trained on our labelled dataset to see how this compares to the zero-shot and few-shot classification approaches"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%load_ext autoreload\n",
|
|
"%autoreload\n",
|
|
"%pip install openai 'openai[datalib]' 'openai[embeddings]' transformers scikit-learn matplotlib plotly pandas scipy\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 56,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import openai\n",
|
|
"import pandas as pd\n",
|
|
"import numpy as np\n",
|
|
"import json\n",
|
|
"import os\n",
|
|
"\n",
|
|
"COMPLETIONS_MODEL = \"gpt-4\"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = \"<your-api-key>\"\n",
|
|
"client = openai.OpenAI()"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load dataset\n",
|
|
"\n",
|
|
"We're using a public transaction dataset of transactions over £25k for the Library of Scotland. The dataset has three features that we'll be using:\n",
|
|
"- Supplier: The name of the supplier\n",
|
|
"- Description: A text description of the transaction\n",
|
|
"- Value: The value of the transaction in GBP\n",
|
|
"\n",
|
|
"**Source**:\n",
|
|
"\n",
|
|
"https://data.nls.uk/data/organisational-data/transactions-over-25k/"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 152,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of transactions: 359\n",
|
|
" Date Supplier Description \\\n",
|
|
"0 21/04/2016 M & J Ballantyne Ltd George IV Bridge Work \n",
|
|
"1 26/04/2016 Private Sale Literary & Archival Items \n",
|
|
"2 30/04/2016 City Of Edinburgh Council Non Domestic Rates \n",
|
|
"3 09/05/2016 Computacenter Uk Kelvin Hall \n",
|
|
"4 09/05/2016 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"\n",
|
|
" Transaction value (£) \n",
|
|
"0 35098.0 \n",
|
|
"1 30000.0 \n",
|
|
"2 40800.0 \n",
|
|
"3 72835.0 \n",
|
|
"4 64361.0 \n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"transactions = pd.read_csv('./data/25000_spend_dataset_current.csv', encoding= 'unicode_escape')\n",
|
|
"print(f\"Number of transactions: {len(transactions)}\")\n",
|
|
"print(transactions.head())\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Zero-shot Classification\n",
|
|
"\n",
|
|
"We'll first assess the performance of the base models at classifying these transactions using a simple prompt. We'll provide the model with 5 categories and a catch-all of \"Could not classify\" for ones that it cannot place."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 154,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"zero_shot_prompt = '''You are a data expert working for the National Library of Scotland.\n",
|
|
"You are analysing all transactions over £25,000 in value and classifying them into one of five categories.\n",
|
|
"The five categories are Building Improvement, Literature & Archive, Utility Bills, Professional Services and Software/IT.\n",
|
|
"If you can't tell what it is, say Could not classify\n",
|
|
"\n",
|
|
"Transaction:\n",
|
|
"\n",
|
|
"Supplier: {}\n",
|
|
"Description: {}\n",
|
|
"Value: {}\n",
|
|
"\n",
|
|
"The classification is:'''\n",
|
|
"\n",
|
|
"def format_prompt(transaction):\n",
|
|
" return zero_shot_prompt.format(transaction['Supplier'], transaction['Description'], transaction['Transaction value (£)'])\n",
|
|
"\n",
|
|
"def classify_transaction(transaction):\n",
|
|
"\n",
|
|
" \n",
|
|
" prompt = format_prompt(transaction)\n",
|
|
" messages = [\n",
|
|
" {\"role\": \"system\", \"content\": prompt},\n",
|
|
" ]\n",
|
|
" completion_response = openai.chat.completions.create(\n",
|
|
" messages=messages,\n",
|
|
" temperature=0,\n",
|
|
" max_tokens=5,\n",
|
|
" top_p=1,\n",
|
|
" frequency_penalty=0,\n",
|
|
" presence_penalty=0,\n",
|
|
" model=COMPLETIONS_MODEL)\n",
|
|
" label = completion_response.choices[0].message.content.replace('\\n','')\n",
|
|
" return label\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 155,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Transaction: M & J Ballantyne Ltd George IV Bridge Work 35098.0\n",
|
|
"Classification: Building Improvement\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Get a test transaction\n",
|
|
"transaction = transactions.iloc[0]\n",
|
|
"# Use our completion function to return a prediction\n",
|
|
"print(f\"Transaction: {transaction['Supplier']} {transaction['Description']} {transaction['Transaction value (£)']}\")\n",
|
|
"print(f\"Classification: {classify_transaction(transaction)}\")\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Our first attempt is correct, M & J Ballantyne Ltd are a house builder and the work they performed is indeed Building Improvement.\n",
|
|
"\n",
|
|
"Lets expand the sample size to 25 and see how it performs, again with just a simple prompt to guide it"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 156,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"/var/folders/3n/79rgh27s6l7_l91b9shw0_nr0000gp/T/ipykernel_81921/2775604370.py:2: SettingWithCopyWarning: \n",
|
|
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
|
|
"Try using .loc[row_indexer,col_indexer] = value instead\n",
|
|
"\n",
|
|
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
|
|
" test_transactions['Classification'] = test_transactions.apply(lambda x: classify_transaction(x),axis=1)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"test_transactions = transactions.iloc[:25]\n",
|
|
"test_transactions['Classification'] = test_transactions.apply(lambda x: classify_transaction(x),axis=1)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 157,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Classification\n",
|
|
"Building Improvement 17\n",
|
|
"Literature & Archive 3\n",
|
|
"Software/IT 2\n",
|
|
"Could not classify 2\n",
|
|
"Utility Bills 1\n",
|
|
"Name: count, dtype: int64"
|
|
]
|
|
},
|
|
"execution_count": 157,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"test_transactions['Classification'].value_counts()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 158,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Date</th>\n",
|
|
" <th>Supplier</th>\n",
|
|
" <th>Description</th>\n",
|
|
" <th>Transaction value (£)</th>\n",
|
|
" <th>Classification</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>21/04/2016</td>\n",
|
|
" <td>M & J Ballantyne Ltd</td>\n",
|
|
" <td>George IV Bridge Work</td>\n",
|
|
" <td>35098.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>26/04/2016</td>\n",
|
|
" <td>Private Sale</td>\n",
|
|
" <td>Literary & Archival Items</td>\n",
|
|
" <td>30000.0</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>30/04/2016</td>\n",
|
|
" <td>City Of Edinburgh Council</td>\n",
|
|
" <td>Non Domestic Rates</td>\n",
|
|
" <td>40800.0</td>\n",
|
|
" <td>Utility Bills</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>09/05/2016</td>\n",
|
|
" <td>Computacenter Uk</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>72835.0</td>\n",
|
|
" <td>Software/IT</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>09/05/2016</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>64361.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>09/05/2016</td>\n",
|
|
" <td>A McGillivray</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>53690.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>16/05/2016</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>365344.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>23/05/2016</td>\n",
|
|
" <td>Computacenter Uk</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>26506.0</td>\n",
|
|
" <td>Software/IT</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>23/05/2016</td>\n",
|
|
" <td>ECG Facilities Service</td>\n",
|
|
" <td>Facilities Management Charge</td>\n",
|
|
" <td>32777.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>23/05/2016</td>\n",
|
|
" <td>ECG Facilities Service</td>\n",
|
|
" <td>Facilities Management Charge</td>\n",
|
|
" <td>32777.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td>30/05/2016</td>\n",
|
|
" <td>ALDL</td>\n",
|
|
" <td>ALDL Charges</td>\n",
|
|
" <td>32317.0</td>\n",
|
|
" <td>Could not classify</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>11</th>\n",
|
|
" <td>10/06/2016</td>\n",
|
|
" <td>Wavetek Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>87589.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>12</th>\n",
|
|
" <td>10/06/2016</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>381803.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>13</th>\n",
|
|
" <td>28/06/2016</td>\n",
|
|
" <td>ECG Facilities Service</td>\n",
|
|
" <td>Facilities Management Charge</td>\n",
|
|
" <td>32832.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>14</th>\n",
|
|
" <td>30/06/2016</td>\n",
|
|
" <td>Glasgow City Council</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>1700000.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>15</th>\n",
|
|
" <td>11/07/2016</td>\n",
|
|
" <td>Wavetek Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>65692.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>16</th>\n",
|
|
" <td>11/07/2016</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>139845.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>17</th>\n",
|
|
" <td>15/07/2016</td>\n",
|
|
" <td>Sotheby'S</td>\n",
|
|
" <td>Literary & Archival Items</td>\n",
|
|
" <td>28500.0</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>18</th>\n",
|
|
" <td>18/07/2016</td>\n",
|
|
" <td>Christies</td>\n",
|
|
" <td>Literary & Archival Items</td>\n",
|
|
" <td>33800.0</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>19</th>\n",
|
|
" <td>25/07/2016</td>\n",
|
|
" <td>A McGillivray</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>30113.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>20</th>\n",
|
|
" <td>31/07/2016</td>\n",
|
|
" <td>ALDL</td>\n",
|
|
" <td>ALDL Charges</td>\n",
|
|
" <td>32317.0</td>\n",
|
|
" <td>Could not classify</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>21</th>\n",
|
|
" <td>08/08/2016</td>\n",
|
|
" <td>ECG Facilities Service</td>\n",
|
|
" <td>Facilities Management Charge</td>\n",
|
|
" <td>32795.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>22</th>\n",
|
|
" <td>15/08/2016</td>\n",
|
|
" <td>Creative Video Productions Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>26866.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>23</th>\n",
|
|
" <td>15/08/2016</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>196807.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>24</th>\n",
|
|
" <td>24/08/2016</td>\n",
|
|
" <td>ECG Facilities Service</td>\n",
|
|
" <td>Facilities Management Charge</td>\n",
|
|
" <td>32795.0</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Date Supplier Description \\\n",
|
|
"0 21/04/2016 M & J Ballantyne Ltd George IV Bridge Work \n",
|
|
"1 26/04/2016 Private Sale Literary & Archival Items \n",
|
|
"2 30/04/2016 City Of Edinburgh Council Non Domestic Rates \n",
|
|
"3 09/05/2016 Computacenter Uk Kelvin Hall \n",
|
|
"4 09/05/2016 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"5 09/05/2016 A McGillivray Causewayside Refurbishment \n",
|
|
"6 16/05/2016 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"7 23/05/2016 Computacenter Uk Kelvin Hall \n",
|
|
"8 23/05/2016 ECG Facilities Service Facilities Management Charge \n",
|
|
"9 23/05/2016 ECG Facilities Service Facilities Management Charge \n",
|
|
"10 30/05/2016 ALDL ALDL Charges \n",
|
|
"11 10/06/2016 Wavetek Ltd Kelvin Hall \n",
|
|
"12 10/06/2016 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"13 28/06/2016 ECG Facilities Service Facilities Management Charge \n",
|
|
"14 30/06/2016 Glasgow City Council Kelvin Hall \n",
|
|
"15 11/07/2016 Wavetek Ltd Kelvin Hall \n",
|
|
"16 11/07/2016 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"17 15/07/2016 Sotheby'S Literary & Archival Items \n",
|
|
"18 18/07/2016 Christies Literary & Archival Items \n",
|
|
"19 25/07/2016 A McGillivray Causewayside Refurbishment \n",
|
|
"20 31/07/2016 ALDL ALDL Charges \n",
|
|
"21 08/08/2016 ECG Facilities Service Facilities Management Charge \n",
|
|
"22 15/08/2016 Creative Video Productions Ltd Kelvin Hall \n",
|
|
"23 15/08/2016 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"24 24/08/2016 ECG Facilities Service Facilities Management Charge \n",
|
|
"\n",
|
|
" Transaction value (£) Classification \n",
|
|
"0 35098.0 Building Improvement \n",
|
|
"1 30000.0 Literature & Archive \n",
|
|
"2 40800.0 Utility Bills \n",
|
|
"3 72835.0 Software/IT \n",
|
|
"4 64361.0 Building Improvement \n",
|
|
"5 53690.0 Building Improvement \n",
|
|
"6 365344.0 Building Improvement \n",
|
|
"7 26506.0 Software/IT \n",
|
|
"8 32777.0 Building Improvement \n",
|
|
"9 32777.0 Building Improvement \n",
|
|
"10 32317.0 Could not classify \n",
|
|
"11 87589.0 Building Improvement \n",
|
|
"12 381803.0 Building Improvement \n",
|
|
"13 32832.0 Building Improvement \n",
|
|
"14 1700000.0 Building Improvement \n",
|
|
"15 65692.0 Building Improvement \n",
|
|
"16 139845.0 Building Improvement \n",
|
|
"17 28500.0 Literature & Archive \n",
|
|
"18 33800.0 Literature & Archive \n",
|
|
"19 30113.0 Building Improvement \n",
|
|
"20 32317.0 Could not classify \n",
|
|
"21 32795.0 Building Improvement \n",
|
|
"22 26866.0 Building Improvement \n",
|
|
"23 196807.0 Building Improvement \n",
|
|
"24 32795.0 Building Improvement "
|
|
]
|
|
},
|
|
"execution_count": 158,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"test_transactions.head(25)\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Initial results are pretty good even with no labelled examples! The ones that it could not classify were tougher cases with few clues as to their topic, but maybe if we clean up the labelled dataset to give more examples we can get better performance."
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Classification with Embeddings\n",
|
|
"\n",
|
|
"Lets create embeddings from the small set that we've classified so far - we've made a set of labelled examples by running the zero-shot classifier on 101 transactions from our dataset and manually correcting the 15 **Could not classify** results that we got\n",
|
|
"\n",
|
|
"### Create embeddings\n",
|
|
"\n",
|
|
"This initial section reuses the approach from the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb) to create embeddings from a combined field concatenating all of our features"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Date</th>\n",
|
|
" <th>Supplier</th>\n",
|
|
" <th>Description</th>\n",
|
|
" <th>Transaction value (£)</th>\n",
|
|
" <th>Classification</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>15/08/2016</td>\n",
|
|
" <td>Creative Video Productions Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>26866</td>\n",
|
|
" <td>Other</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>74806</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>Morris & Spottiswood Ltd</td>\n",
|
|
" <td>George IV Bridge Work</td>\n",
|
|
" <td>56448</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>31/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>164691</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>24/07/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>27926</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Date Supplier Description \\\n",
|
|
"0 15/08/2016 Creative Video Productions Ltd Kelvin Hall \n",
|
|
"1 29/05/2017 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"2 29/05/2017 Morris & Spottiswood Ltd George IV Bridge Work \n",
|
|
"3 31/05/2017 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"4 24/07/2017 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"\n",
|
|
" Transaction value (£) Classification \n",
|
|
"0 26866 Other \n",
|
|
"1 74806 Building Improvement \n",
|
|
"2 56448 Building Improvement \n",
|
|
"3 164691 Building Improvement \n",
|
|
"4 27926 Building Improvement "
|
|
]
|
|
},
|
|
"execution_count": 159,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df = pd.read_csv('./data/labelled_transactions.csv')\n",
|
|
"df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 160,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Date</th>\n",
|
|
" <th>Supplier</th>\n",
|
|
" <th>Description</th>\n",
|
|
" <th>Transaction value (£)</th>\n",
|
|
" <th>Classification</th>\n",
|
|
" <th>combined</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>15/08/2016</td>\n",
|
|
" <td>Creative Video Productions Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>26866</td>\n",
|
|
" <td>Other</td>\n",
|
|
" <td>Supplier: Creative Video Productions Ltd; Desc...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>74806</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Date Supplier Description \\\n",
|
|
"0 15/08/2016 Creative Video Productions Ltd Kelvin Hall \n",
|
|
"1 29/05/2017 John Graham Construction Ltd Causewayside Refurbishment \n",
|
|
"\n",
|
|
" Transaction value (£) Classification \\\n",
|
|
"0 26866 Other \n",
|
|
"1 74806 Building Improvement \n",
|
|
"\n",
|
|
" combined \n",
|
|
"0 Supplier: Creative Video Productions Ltd; Desc... \n",
|
|
"1 Supplier: John Graham Construction Ltd; Descri... "
|
|
]
|
|
},
|
|
"execution_count": 160,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df['combined'] = \"Supplier: \" + df['Supplier'].str.strip() + \"; Description: \" + df['Description'].str.strip() + \"; Value: \" + str(df['Transaction value (£)']).strip()\n",
|
|
"df.head(2)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 161,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"101"
|
|
]
|
|
},
|
|
"execution_count": 161,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from transformers import GPT2TokenizerFast\n",
|
|
"tokenizer = GPT2TokenizerFast.from_pretrained(\"gpt2\")\n",
|
|
"\n",
|
|
"df['n_tokens'] = df.combined.apply(lambda x: len(tokenizer.encode(x)))\n",
|
|
"len(df)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 162,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"embedding_path = './data/transactions_with_embeddings_100.csv'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 163,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from utils.embeddings_utils import get_embedding\n",
|
|
"df['babbage_similarity'] = df.combined.apply(lambda x: get_embedding(x))\n",
|
|
"df['babbage_search'] = df.combined.apply(lambda x: get_embedding(x))\n",
|
|
"df.to_csv(embedding_path)\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Use embeddings for classification\n",
|
|
"\n",
|
|
"Now that we have our embeddings, let see if classifying these into the categories we've named gives us any more success.\n",
|
|
"\n",
|
|
"For this we'll use a template from the [Classification_using_embeddings](Classification_using_embeddings.ipynb) notebook"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 164,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Unnamed: 0</th>\n",
|
|
" <th>Date</th>\n",
|
|
" <th>Supplier</th>\n",
|
|
" <th>Description</th>\n",
|
|
" <th>Transaction value (£)</th>\n",
|
|
" <th>Classification</th>\n",
|
|
" <th>combined</th>\n",
|
|
" <th>n_tokens</th>\n",
|
|
" <th>babbage_similarity</th>\n",
|
|
" <th>babbage_search</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>15/08/2016</td>\n",
|
|
" <td>Creative Video Productions Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>26866</td>\n",
|
|
" <td>Other</td>\n",
|
|
" <td>Supplier: Creative Video Productions Ltd; Desc...</td>\n",
|
|
" <td>136</td>\n",
|
|
" <td>[-0.02898375503718853, -0.02881557121872902, 0...</td>\n",
|
|
" <td>[-0.02879939414560795, -0.02867320366203785, 0...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>74806</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>Morris & Spottiswood Ltd</td>\n",
|
|
" <td>George IV Bridge Work</td>\n",
|
|
" <td>56448</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: Morris & Spottiswood Ltd; Descriptio...</td>\n",
|
|
" <td>141</td>\n",
|
|
" <td>[0.013581369072198868, -0.003978211898356676, ...</td>\n",
|
|
" <td>[0.013593776151537895, -0.0037341134157031775,...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>31/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>164691</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>24/07/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>27926</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>[-0.02408558875322342, -0.02881370671093464, 0...</td>\n",
|
|
" <td>[-0.024109570309519768, -0.02880912832915783, ...</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Unnamed: 0 Date Supplier \\\n",
|
|
"0 0 15/08/2016 Creative Video Productions Ltd \n",
|
|
"1 1 29/05/2017 John Graham Construction Ltd \n",
|
|
"2 2 29/05/2017 Morris & Spottiswood Ltd \n",
|
|
"3 3 31/05/2017 John Graham Construction Ltd \n",
|
|
"4 4 24/07/2017 John Graham Construction Ltd \n",
|
|
"\n",
|
|
" Description Transaction value (£) Classification \\\n",
|
|
"0 Kelvin Hall 26866 Other \n",
|
|
"1 Causewayside Refurbishment 74806 Building Improvement \n",
|
|
"2 George IV Bridge Work 56448 Building Improvement \n",
|
|
"3 Causewayside Refurbishment 164691 Building Improvement \n",
|
|
"4 Causewayside Refurbishment 27926 Building Improvement \n",
|
|
"\n",
|
|
" combined n_tokens \\\n",
|
|
"0 Supplier: Creative Video Productions Ltd; Desc... 136 \n",
|
|
"1 Supplier: John Graham Construction Ltd; Descri... 140 \n",
|
|
"2 Supplier: Morris & Spottiswood Ltd; Descriptio... 141 \n",
|
|
"3 Supplier: John Graham Construction Ltd; Descri... 140 \n",
|
|
"4 Supplier: John Graham Construction Ltd; Descri... 140 \n",
|
|
"\n",
|
|
" babbage_similarity \\\n",
|
|
"0 [-0.02898375503718853, -0.02881557121872902, 0... \n",
|
|
"1 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"2 [0.013581369072198868, -0.003978211898356676, ... \n",
|
|
"3 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"4 [-0.02408558875322342, -0.02881370671093464, 0... \n",
|
|
"\n",
|
|
" babbage_search \n",
|
|
"0 [-0.02879939414560795, -0.02867320366203785, 0... \n",
|
|
"1 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"2 [0.013593776151537895, -0.0037341134157031775,... \n",
|
|
"3 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"4 [-0.024109570309519768, -0.02880912832915783, ... "
|
|
]
|
|
},
|
|
"execution_count": 164,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from sklearn.ensemble import RandomForestClassifier\n",
|
|
"from sklearn.model_selection import train_test_split\n",
|
|
"from sklearn.metrics import classification_report\n",
|
|
"from ast import literal_eval\n",
|
|
"\n",
|
|
"fs_df = pd.read_csv(embedding_path)\n",
|
|
"fs_df[\"babbage_similarity\"] = fs_df.babbage_similarity.apply(literal_eval).apply(np.array)\n",
|
|
"fs_df.head()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 165,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" precision recall f1-score support\n",
|
|
"\n",
|
|
"Building Improvement 0.92 1.00 0.96 11\n",
|
|
"Literature & Archive 1.00 1.00 1.00 3\n",
|
|
" Other 0.00 0.00 0.00 1\n",
|
|
" Software/IT 1.00 1.00 1.00 1\n",
|
|
" Utility Bills 1.00 1.00 1.00 5\n",
|
|
"\n",
|
|
" accuracy 0.95 21\n",
|
|
" macro avg 0.78 0.80 0.79 21\n",
|
|
" weighted avg 0.91 0.95 0.93 21\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"/Users/vishnu/code/openai-cookbook/cookbook_env/lib/python3.11/site-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
|
|
" _warn_prf(average, modifier, f\"{metric.capitalize()} is\", len(result))\n",
|
|
"/Users/vishnu/code/openai-cookbook/cookbook_env/lib/python3.11/site-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
|
|
" _warn_prf(average, modifier, f\"{metric.capitalize()} is\", len(result))\n",
|
|
"/Users/vishnu/code/openai-cookbook/cookbook_env/lib/python3.11/site-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
|
|
" _warn_prf(average, modifier, f\"{metric.capitalize()} is\", len(result))\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"X_train, X_test, y_train, y_test = train_test_split(\n",
|
|
" list(fs_df.babbage_similarity.values), fs_df.Classification, test_size=0.2, random_state=42\n",
|
|
")\n",
|
|
"\n",
|
|
"clf = RandomForestClassifier(n_estimators=100)\n",
|
|
"clf.fit(X_train, y_train)\n",
|
|
"preds = clf.predict(X_test)\n",
|
|
"probas = clf.predict_proba(X_test)\n",
|
|
"\n",
|
|
"report = classification_report(y_test, preds)\n",
|
|
"print(report)\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Performance for this model is pretty strong, so creating embeddings and using even a simpler classifier looks like an effective approach as well, with the zero-shot classifier helping us do the initial classification of the unlabelled dataset.\n",
|
|
"\n",
|
|
"Lets take it one step further and see if a fine-tuned model trained on this same labelled datasets gives us comparable results"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Fine-tuned Transaction Classification\n",
|
|
"\n",
|
|
"For this use case we're going to try to improve on the few-shot classification from above by training a fine-tuned model on the same labelled set of 101 transactions and applying this fine-tuned model on group of unseen transactions"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Building Fine-tuned Classifier\n",
|
|
"\n",
|
|
"We'll need to do some data prep first to get our data ready. This will take the following steps:\n",
|
|
"- To prepare our training and validation sets, we'll create a set of message sequences. The first message for each will be the user prompt formatted with the details of the transaction, and the final message will be the expected classification response from the model\n",
|
|
"- Our test set will contain the initial user prompt for each transaction, along with the corresponding expected class label. We will then use the fine-tuned model to generate the actual classification for each transaction."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 64,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"101"
|
|
]
|
|
},
|
|
"execution_count": 64,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"ft_prep_df = fs_df.copy()\n",
|
|
"len(ft_prep_df)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 65,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Unnamed: 0</th>\n",
|
|
" <th>Date</th>\n",
|
|
" <th>Supplier</th>\n",
|
|
" <th>Description</th>\n",
|
|
" <th>Transaction value (£)</th>\n",
|
|
" <th>Classification</th>\n",
|
|
" <th>combined</th>\n",
|
|
" <th>n_tokens</th>\n",
|
|
" <th>babbage_similarity</th>\n",
|
|
" <th>babbage_search</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>15/08/2016</td>\n",
|
|
" <td>Creative Video Productions Ltd</td>\n",
|
|
" <td>Kelvin Hall</td>\n",
|
|
" <td>26866</td>\n",
|
|
" <td>Other</td>\n",
|
|
" <td>Supplier: Creative Video Productions Ltd; Desc...</td>\n",
|
|
" <td>136</td>\n",
|
|
" <td>[-0.028885245323181152, -0.028660893440246582,...</td>\n",
|
|
" <td>[-0.02879939414560795, -0.02867320366203785, 0...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>74806</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" <td>[-0.02414606139063835, -0.02883070334792137, 0...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>29/05/2017</td>\n",
|
|
" <td>Morris & Spottiswood Ltd</td>\n",
|
|
" <td>George IV Bridge Work</td>\n",
|
|
" <td>56448</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: Morris & Spottiswood Ltd; Descriptio...</td>\n",
|
|
" <td>141</td>\n",
|
|
" <td>[0.013593776151537895, -0.0037341134157031775,...</td>\n",
|
|
" <td>[0.013561442494392395, -0.004199974238872528, ...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>31/05/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>164691</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>24/07/2017</td>\n",
|
|
" <td>John Graham Construction Ltd</td>\n",
|
|
" <td>Causewayside Refurbishment</td>\n",
|
|
" <td>27926</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>Supplier: John Graham Construction Ltd; Descri...</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" <td>[-0.024112487211823463, -0.02881261520087719, ...</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Unnamed: 0 Date Supplier \\\n",
|
|
"0 0 15/08/2016 Creative Video Productions Ltd \n",
|
|
"1 1 29/05/2017 John Graham Construction Ltd \n",
|
|
"2 2 29/05/2017 Morris & Spottiswood Ltd \n",
|
|
"3 3 31/05/2017 John Graham Construction Ltd \n",
|
|
"4 4 24/07/2017 John Graham Construction Ltd \n",
|
|
"\n",
|
|
" Description Transaction value (£) Classification \\\n",
|
|
"0 Kelvin Hall 26866 Other \n",
|
|
"1 Causewayside Refurbishment 74806 Building Improvement \n",
|
|
"2 George IV Bridge Work 56448 Building Improvement \n",
|
|
"3 Causewayside Refurbishment 164691 Building Improvement \n",
|
|
"4 Causewayside Refurbishment 27926 Building Improvement \n",
|
|
"\n",
|
|
" combined n_tokens \\\n",
|
|
"0 Supplier: Creative Video Productions Ltd; Desc... 136 \n",
|
|
"1 Supplier: John Graham Construction Ltd; Descri... 140 \n",
|
|
"2 Supplier: Morris & Spottiswood Ltd; Descriptio... 141 \n",
|
|
"3 Supplier: John Graham Construction Ltd; Descri... 140 \n",
|
|
"4 Supplier: John Graham Construction Ltd; Descri... 140 \n",
|
|
"\n",
|
|
" babbage_similarity \\\n",
|
|
"0 [-0.028885245323181152, -0.028660893440246582,... \n",
|
|
"1 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"2 [0.013593776151537895, -0.0037341134157031775,... \n",
|
|
"3 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"4 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"\n",
|
|
" babbage_search \n",
|
|
"0 [-0.02879939414560795, -0.02867320366203785, 0... \n",
|
|
"1 [-0.02414606139063835, -0.02883070334792137, 0... \n",
|
|
"2 [0.013561442494392395, -0.004199974238872528, ... \n",
|
|
"3 [-0.024112487211823463, -0.02881261520087719, ... \n",
|
|
"4 [-0.024112487211823463, -0.02881261520087719, ... "
|
|
]
|
|
},
|
|
"execution_count": 65,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"ft_prep_df.head()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 96,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"( class_id class\n",
|
|
" 0 0 Other\n",
|
|
" 1 1 Literature & Archive\n",
|
|
" 2 2 Software/IT\n",
|
|
" 3 3 Utility Bills\n",
|
|
" 4 4 Building Improvement,\n",
|
|
" 5)"
|
|
]
|
|
},
|
|
"execution_count": 96,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"classes = list(set(ft_prep_df['Classification']))\n",
|
|
"class_df = pd.DataFrame(classes).reset_index()\n",
|
|
"class_df.columns = ['class_id','class']\n",
|
|
"class_df , len(class_df)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 181,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>messages</th>\n",
|
|
" <th>class</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Other</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" messages class\n",
|
|
"0 [{'role': 'user', 'content': 'You are a data e... Other\n",
|
|
"1 [{'role': 'user', 'content': 'You are a data e... Building Improvement\n",
|
|
"2 [{'role': 'user', 'content': 'You are a data e... Building Improvement\n",
|
|
"3 [{'role': 'user', 'content': 'You are a data e... Building Improvement\n",
|
|
"4 [{'role': 'user', 'content': 'You are a data e... Building Improvement"
|
|
]
|
|
},
|
|
"execution_count": 181,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"ft_df_with_class = ft_prep_df.merge(class_df,left_on='Classification',right_on='class',how='inner')\n",
|
|
"\n",
|
|
"# Creating a list of messages for the fine-tuning job. The user message is the prompt, and the assistant message is the response from the model\n",
|
|
"ft_df_with_class['messages'] = ft_df_with_class.apply(lambda x: [{\"role\": \"user\", \"content\": format_prompt(x)}, {\"role\": \"assistant\", \"content\": x['class']}],axis=1)\n",
|
|
"ft_df_with_class[['messages', 'class']].head()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 169,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Create train/validation split\n",
|
|
"samples = ft_df_with_class[\"messages\"].tolist()\n",
|
|
"train_df, valid_df = train_test_split(samples, test_size=0.2, random_state=42)\n",
|
|
"\n",
|
|
"def write_to_jsonl(list_of_messages, filename):\n",
|
|
" with open(filename, \"w+\") as f:\n",
|
|
" for messages in list_of_messages:\n",
|
|
" object = { \n",
|
|
" \"messages\": messages\n",
|
|
" }\n",
|
|
" f.write(json.dumps(object) + \"\\n\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 186,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Write the train/validation split to jsonl files\n",
|
|
"train_file_name, valid_file_name = \"transactions_grouped_train.jsonl\", \"transactions_grouped_valid.jsonl\"\n",
|
|
"write_to_jsonl(train_df, train_file_name)\n",
|
|
"write_to_jsonl(valid_df, valid_file_name)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Upload the files to OpenAI\n",
|
|
"train_file = client.files.create(file=open(train_file_name, \"rb\"), purpose=\"fine-tune\")\n",
|
|
"valid_file = client.files.create(file=open(valid_file_name, \"rb\"), purpose=\"fine-tune\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Create the fine-tuning job\n",
|
|
"fine_tuning_job = client.fine_tuning.jobs.create(training_file=train_file.id, validation_file=valid_file.id, model=\"gpt-4o-2024-08-06\")\n",
|
|
"# Get the fine-tuning job status and model name\n",
|
|
"status = client.fine_tuning.jobs.retrieve(fine_tuning_job.id)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 209,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Fine tuned model id: ft:gpt-4o-2024-08-06:openai::BKr3Xy8U\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Once the fine-tuning job is complete, you can retrieve the model name from the job status\n",
|
|
"fine_tuned_model = client.fine_tuning.jobs.retrieve(fine_tuning_job.id).fine_tuned_model\n",
|
|
"print(f\"Fine tuned model id: {fine_tuned_model}\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Applying Fine-tuned Classifier\n",
|
|
"\n",
|
|
"Now we'll apply our classifier to see how it performs. We only had 31 unique observations in our training set and 8 in our validation set, so lets see how the performance is"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 210,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>messages</th>\n",
|
|
" <th>expected_class</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Utility Bills</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" messages expected_class\n",
|
|
"0 [{'role': 'user', 'content': 'You are a data e... Utility Bills\n",
|
|
"1 [{'role': 'user', 'content': 'You are a data e... Literature & Archive\n",
|
|
"2 [{'role': 'user', 'content': 'You are a data e... Literature & Archive\n",
|
|
"3 [{'role': 'user', 'content': 'You are a data e... Literature & Archive\n",
|
|
"4 [{'role': 'user', 'content': 'You are a data e... Building Improvement"
|
|
]
|
|
},
|
|
"execution_count": 210,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Create a test set with the expected class labels\n",
|
|
"test_set = pd.read_json(valid_file_name, lines=True)\n",
|
|
"test_set['expected_class'] = test_set.apply(lambda x: x['messages'][-1]['content'], axis=1)\n",
|
|
"test_set.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>messages</th>\n",
|
|
" <th>expected_class</th>\n",
|
|
" <th>response</th>\n",
|
|
" <th>predicted_class</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Utility Bills</td>\n",
|
|
" <td>ChatCompletion(id='chatcmpl-BKrC0S1wQSfM9ZQfcC...</td>\n",
|
|
" <td>Utility Bills</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" <td>ChatCompletion(id='chatcmpl-BKrC1BTr0DagbDkC2s...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" <td>ChatCompletion(id='chatcmpl-BKrC1H3ZeIW5cz2Owr...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" <td>ChatCompletion(id='chatcmpl-BKrC1wdhaMP0Q7YmYx...</td>\n",
|
|
" <td>Literature & Archive</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>[{'role': 'user', 'content': 'You are a data e...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" <td>ChatCompletion(id='chatcmpl-BKrC20c5pkpngy1xDu...</td>\n",
|
|
" <td>Building Improvement</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" messages expected_class \\\n",
|
|
"0 [{'role': 'user', 'content': 'You are a data e... Utility Bills \n",
|
|
"1 [{'role': 'user', 'content': 'You are a data e... Literature & Archive \n",
|
|
"2 [{'role': 'user', 'content': 'You are a data e... Literature & Archive \n",
|
|
"3 [{'role': 'user', 'content': 'You are a data e... Literature & Archive \n",
|
|
"4 [{'role': 'user', 'content': 'You are a data e... Building Improvement \n",
|
|
"\n",
|
|
" response predicted_class \n",
|
|
"0 ChatCompletion(id='chatcmpl-BKrC0S1wQSfM9ZQfcC... Utility Bills \n",
|
|
"1 ChatCompletion(id='chatcmpl-BKrC1BTr0DagbDkC2s... Literature & Archive \n",
|
|
"2 ChatCompletion(id='chatcmpl-BKrC1H3ZeIW5cz2Owr... Literature & Archive \n",
|
|
"3 ChatCompletion(id='chatcmpl-BKrC1wdhaMP0Q7YmYx... Literature & Archive \n",
|
|
"4 ChatCompletion(id='chatcmpl-BKrC20c5pkpngy1xDu... Building Improvement "
|
|
]
|
|
},
|
|
"execution_count": 211,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Apply the fine-tuned model to the test set\n",
|
|
"test_set['response'] = test_set.apply(lambda x: openai.chat.completions.create(model=fine_tuned_model, messages=x['messages'][:-1], temperature=0),axis=1)\n",
|
|
"test_set['predicted_class'] = test_set.apply(lambda x: x['response'].choices[0].message.content, axis=1)\n",
|
|
"\n",
|
|
"test_set.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 212,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"result\n",
|
|
"True 20\n",
|
|
"False 1\n",
|
|
"Name: count, dtype: int64\n",
|
|
"F1 Score: 0.9296066252587991\n",
|
|
"Raw Accuracy: 0.9523809523809523\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Calculate the accuracy of the predictions\n",
|
|
"from sklearn.metrics import f1_score\n",
|
|
"test_set['result'] = test_set.apply(lambda x: str(x['predicted_class']).strip() == str(x['expected_class']).strip(), axis = 1)\n",
|
|
"test_set['result'].value_counts()\n",
|
|
"\n",
|
|
"print(test_set['result'].value_counts())\n",
|
|
"\n",
|
|
"print(\"F1 Score: \", f1_score(test_set['expected_class'], test_set['predicted_class'], average=\"weighted\"))\n",
|
|
"print(\"Raw Accuracy: \", test_set['result'].value_counts()[True] / len(test_set))\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "cookbook_env",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.8"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|