openai-cookbook/examples/o1/Using_reasoning_for_data_validation.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using reasoning for data validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this guide, we’ll explore how to use the o1 model, specifically o1-preview, to perform data validation through reasoning. We’ll walk through a practical example involving a synthetic medical dataset and demonstrate how to assess the model’s accuracy in identifying issues within the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "Data validation is a critical step in ensuring the quality and reliability of datasets, especially in sensitive fields like healthcare. Traditional validation methods often rely on predefined rules and patterns. However, advanced  models like o1 can understand context and reason about data, offering a more flexible and intelligent approach to validation.\n",
    "\n",
    "In this tutorial, we will:\n",
    "- Generate a synthetic dataset of medical data that contains inconsistencies.\n",
    "- Define a function that takes in a row of data and validates its accuracy\n",
    "- Run the validation process and compute accuracy metrics.\n",
    "- Analyze and interpret the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from openai import OpenAI\n",
    "import json\n",
    "from IPython.display import display, HTML\n",
    "from sklearn.metrics import precision_score, recall_score, f1_score\n",
    "from concurrent.futures import ThreadPoolExecutor, as_completed\n",
    "import csv\n",
    "import pandas as pd\n",
    "\n",
    "client = OpenAI()\n",
    "MODEL = 'o1-preview'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Synthetic Data Generation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use a lot of the principles described in the [Synthetic Data Generation](https://cookbook.openai.com/examples/sdg1) cookbook to create the foundation of our dataset.\n",
    "\n",
    "We will prompt the model to generate sets of medical data for our use case. We have provided detailed instructions to the model on how to create the dataset, what format to follow, and how to fill it with inaccuracies. We also provide a few rows of sample data to get the model started. \n",
    "\n",
    "Each row in the dataset will have the following fields:\n",
    "- Patient ID: A randomly generated patient id\n",
    "- Date of Birth: Date of birth of the patient\n",
    "- Gender: M/F\n",
    "- Medical History: Past diagnoses\n",
    "- Current Medications: Medication the patient is taking\n",
    "- Allergies: Identified allergies\n",
    "- Lab Results (Glucose mg/dL)\n",
    "- Diagnoses: Current diagnosis\n",
    "- Treatment Plan: Current treatment plan\n",
    "- Is Valid: Whether or not the current row of data is valid (True/False)\n",
    "- Issue: If the row of data is not valid, what the issue is\n",
    "\n",
    "Some examples of inaccuracies that may be present in the data are:\n",
    "- Prescribing medications that the patient is allergic to\n",
    "- Current medications do not match medical history\n",
    "- Treatment plan does not match diagnosis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_data():\n",
    "    messages = [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": \"\"\"\n",
    "You are a helpful assistant designed to generate data. You will be given a format for the data to generate and some examples of the data.\n",
    "\n",
    "When generating Patient IDs, use the format 'P' followed by a three-digit number (e.g., P006, P941, P319).\n",
    "\n",
    "Intentionally make some mistakes in the data generation and document them in the appropriate columns ('Is Valid' and 'Issue') if the row of data is invalid.\n",
    "\n",
    "The types of mistakes to include are:\n",
    "\n",
    "- **Allergy Contradictions**: Prescribing a medication that the patient is allergic to (e.g., prescribing Penicillin to a patient allergic to Penicillin).\n",
    "- **Medical History and Medication Mismatch**: A patient with a medical condition not receiving appropriate medication (e.g., a diabetic patient not prescribed any diabetes medication).\n",
    "- **Lab Results and Diagnosis Mismatch**: Lab results that do not support the diagnosis (e.g., normal glucose levels but diagnosed with Diabetes Type 2).\n",
    "- **Other Plausible Mistakes**: Any other realistic errors that could occur in medical records, such as incorrect gender entries, impossible dates of birth, or inconsistent treatment plans.\n",
    "\n",
    "Ensure that when 'Is Valid' is 'False', the 'Issue' column clearly explains the problem.\n",
    "\n",
    "Return 100 rows of data for the user. Your response should strictly be in the format of a valid CSV.\n",
    "\n",
    "Generate Synthetic Medical Records Dataset with the following columns:\n",
    "    - Patient ID: A randomly generated patient id\n",
    "    - Date of Birth: Date of birth of the patient\n",
    "    - Gender: M/F\n",
    "    - Medical History: Past diagnoses\n",
    "    - Current Medications: Medication the patient is taking\n",
    "    - Allergies: Identified allergies\n",
    "    - Lab Results (Glucose mg/dL)\n",
    "    - Diagnoses: Current diagnosis\n",
    "    - Treatment Plan: Current treatment plan\n",
    "    - Is Valid: Whether or not the current row of data is valid (True/False)\n",
    "    - Issue: If the row of data is not valid, what the issue is\n",
    "\n",
    "Patient ID,Date of Birth,Gender,Medical History,Current Medications,Allergies,Lab Results (Glucose mg/dL),Diagnoses,Treatment Plan,Is Valid,Issue\n",
    "P001,1980-05-14,M,Hypertension,Lisinopril,None,110,Hypertension,Continue Lisinopril,True,\n",
    "P002,1975-11-30,F,Diabetes Type 2,Metformin,Penicillin,90,Diabetes Type 2,Continue Metformin,True,\n",
    "P003,1990-07-22,F,Asthma,Albuterol,Aspirin,85,Asthma,Prescribe Albuterol,True,\n",
    "P004,2000-03-10,M,None,Amoxicillin,Penicillin,95,Infection,Prescribe Amoxicillin,False,Prescribed Amoxicillin despite Penicillin allergy\n",
    "P005,1985-09-18,F,Hyperlipidemia,Atorvastatin,None,200,Hyperlipidemia,Continue Atorvastatin,True,\n",
    "P006,1978-12-05,M,Hypertension; Diabetes Type 2,Lisinopril; Insulin,None,55,Diabetes Type 2,Adjust insulin dosage,False,Low glucose level not properly addressed\n",
    "            \"\"\"\n",
    "        }\n",
    "    ]\n",
    "\n",
    "    response = client.chat.completions.create(\n",
    "        model=MODEL,\n",
    "        messages=messages\n",
    "    )\n",
    "\n",
    "    return response.choices[0].message.content.replace('```csv', '').replace('```', '')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Synthetic data generation and appending completed.\n"
     ]
    }
   ],
   "source": [
    "# Generate data three times using the existing dataGeneration function\n",
    "generated_data = []\n",
    "data = generate_data()\n",
    "generated_data.extend(data.strip().split('\\n'))\n",
    "\n",
    "# Append the generated data to the medicalData.csv file\n",
    "with open('../data/medicalData.csv', 'a', newline='') as csvfile:\n",
    "    csvwriter = csv.writer(csvfile)\n",
    "    for row in generated_data:\n",
    "        csvwriter.writerow(row.split(','))\n",
    "\n",
    "print(\"Synthetic data generation and appending completed.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have our dataset prepared, we will prompt the reasoning model to review each row of data and determine whether or not it contains an issue. We will ask the model to output whether or not there is an issue in the data and then offer an explanation of the issue.\n",
    "\n",
    "Once we have the model determine its list of invalid data, we will pass those results on to a model grader to assess two metrics:\n",
    "- Accuracy of the model's ability correctly identify issues with the data\n",
    "- For the subset of data that issues have been correctly identified, what is the accuracy of the model in identifying the issue at hand\n",
    "\n",
    "Given that this task is much more narrow, we can use the faster gpt-4o model to calculate the accuracy.\n",
    "\n",
    "REMINDER: Given that these models are still in beta, rate limits will be significantly reduced. Please adjust the number of concurrent workers accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def validate_data(input_data):\n",
    "    messages = [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": f\"\"\"\n",
    "You are a helpful assistant designed to validate the quality of medical datasets. You will be given a single row of medical data, and your task is to determine whether the data is valid.\n",
    "\n",
    "- Carefully analyze the data for any inconsistencies, contradictions, missing values, or implausible information.\n",
    "- Consider the logical relationships between different fields (e.g., treatments should be appropriate for the diagnoses, medications should not conflict with allergies, lab results should be consistent with diagnoses, etc.).\n",
    "- Use your general medical knowledge to assess the validity of the data.\n",
    "- Focus solely on the information provided without making assumptions beyond the given data.\n",
    "\n",
    "**Return only a JSON object** with the following two properties:\n",
    "\n",
    "- `\"is_valid\"`: a boolean (`true` or `false`) indicating whether the data is valid.\n",
    "- `\"issue\"`: if `\"is_valid\"` is `false`, provide a brief explanation of the issue; if `\"is_valid\"` is `true`, set `\"issue\"` to `null`.\n",
    "\n",
    "Both JSON properties must always be present.\n",
    "\n",
    "Do not include any additional text or explanations outside the JSON object.\n",
    "\n",
    "MEDICAL DATA:\n",
    "{input_data}\n",
    "            \"\"\"\n",
    "        }\n",
    "    ]\n",
    "\n",
    "    response = client.chat.completions.create(\n",
    "        model=MODEL,\n",
    "        messages=messages\n",
    "    )\n",
    "\n",
    "    response_content = response.choices[0].message.content.replace('```json', '').replace('```', '').strip()\n",
    "    \n",
    "    try:\n",
    "        if isinstance(response_content, dict):\n",
    "            response_dict = response_content\n",
    "        else:\n",
    "            response_dict = json.loads(response_content)\n",
    "        return response_dict\n",
    "    except json.JSONDecodeError as e:\n",
    "        print(f\"Failed to decode JSON response: {response_content}\")\n",
    "        raise e"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the CSV file and exclude the last two columns\n",
    "input_data = []\n",
    "with open('../data/medicalData.csv', 'r') as file:\n",
    "    reader = csv.reader(file)\n",
    "    headers = next(reader)\n",
    "    for row in reader:\n",
    "        input_data.append(row[:-2])  # Exclude \"Is Valid\" and \"Issue\" columns\n",
    "\n",
    "# Initialize lists to store true labels\n",
    "true_is_valid = []\n",
    "true_issues = []\n",
    "\n",
    "# Extract true labels from the CSV file\n",
    "with open('../data/medicalData.csv', 'r') as file:\n",
    "    reader = csv.reader(file)\n",
    "    headers = next(reader)\n",
    "    for row in reader:\n",
    "        true_is_valid.append(row[-2] == 'True')\n",
    "        true_issues.append(row[-1])\n",
    "\n",
    "# Function to validate a single row of data\n",
    "def validate_row(row):\n",
    "    input_str = ','.join(row)\n",
    "    result_json = validate_data(input_str)\n",
    "    return result_json\n",
    "\n",
    "# Validate data rows and collect results\n",
    "pred_is_valid = [False] * len(input_data)\n",
    "pred_issues = [''] * len(input_data)\n",
    "\n",
    "with ThreadPoolExecutor() as executor:\n",
    "    futures = {executor.submit(validate_row, row): i for i, row in enumerate(input_data)}\n",
    "    \n",
    "    for future in as_completed(futures):\n",
    "        i = futures[future]  # Get the index of the current row\n",
    "        result_json = future.result()\n",
    "        pred_is_valid[i] = result_json['is_valid']\n",
    "        pred_issues[i] = result_json['issue']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the model's results, we can compare it against the source of truth and determine the system's accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert predicted and true 'is_valid' labels to boolean if they aren't already\n",
    "pred_is_valid_bool = [bool(val) if isinstance(val, bool) else val == 'True' for val in pred_is_valid]\n",
    "true_is_valid_bool = [bool(val) if isinstance(val, bool) else val == 'True' for val in true_is_valid]\n",
    "\n",
    "# Calculate precision, recall, and f1 score for the 'is_valid' prediction\n",
    "precision = precision_score(true_is_valid_bool, pred_is_valid_bool, pos_label=True)\n",
    "recall = recall_score(true_is_valid_bool, pred_is_valid_bool, pos_label=True)\n",
    "f1 = f1_score(true_is_valid_bool, pred_is_valid_bool, pos_label=True)\n",
    "\n",
    "# Initialize issue_matches_full with False\n",
    "issue_matches_full = [False] * len(true_is_valid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Precision: 0.82\n",
      "Recall: 0.87\n",
      "F1: 0.84\n"
     ]
    }
   ],
   "source": [
    "print(f\"Precision: {precision:.2f}\")\n",
    "print(f\"Recall: {recall:.2f}\")\n",
    "print(f\"F1: {f1:.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Issue Identification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now determine the model's ability to accurately classify the issue in the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def validate_issue(model_generated_answer, correct_answer):\n",
    "    messages = [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": f\"\"\"\n",
    "You are a medical expert assistant designed to validate the quality of an LLM-generated answer.\n",
    "\n",
    "The model was asked to review a medical dataset row to determine if the data is valid. If the data is not valid, it should provide a justification explaining why.\n",
    "\n",
    "Your task:\n",
    "\n",
    "    •\tCompare the model-generated justification with the correct reason provided.\n",
    "    •\tDetermine if they address the same underlying medical issue or concern, even if phrased differently.\n",
    "    •\tFocus on the intent, medical concepts, and implications rather than exact wording.\n",
    "\n",
    "Instructions:\n",
    "\n",
    "    •\tIf the justifications have the same intent or address the same medical issue, return True.\n",
    "    •\tIf they address different issues or concerns, return False.\n",
    "    •\tOnly respond with a single word: True or False.\n",
    "\n",
    "Examples:\n",
    "\n",
    "    1.\tExample 1:\n",
    "    •\tModel Generated Response: “The patient is allergic to penicillin”\n",
    "    •\tCorrect Response: “The patient was prescribed penicillin despite being allergic”\n",
    "    •\tAnswer: True\n",
    "    2.\tExample 2:\n",
    "    •\tModel Generated Response: “The date of birth of the patient is incorrect”\n",
    "    •\tCorrect Response: “The patient was prescribed penicillin despite being allergic”\n",
    "    •\tAnswer: False\n",
    "\n",
    "\n",
    "Model Generated Response: {model_generated_answer}\n",
    "Correct Response:  {correct_answer}\n",
    "            \"\"\"\n",
    "        }\n",
    "    ]\n",
    "\n",
    "    response = client.chat.completions.create(\n",
    "        model=\"o1-preview\",\n",
    "        messages=messages\n",
    "    )\n",
    "\n",
    "    result = response.choices[0].message.content\n",
    "\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Validate issues for rows where both true and predicted 'is_valid' are False\n",
    "validation_results = []\n",
    "\n",
    "with ThreadPoolExecutor() as executor:\n",
    "    futures = {\n",
    "        executor.submit(validate_issue, pred_issues[i], true_issues[i]): i\n",
    "        for i in range(len(pred_is_valid_bool))\n",
    "        if not pred_is_valid_bool[i] and not true_is_valid_bool[i]\n",
    "    }\n",
    "    \n",
    "    for future in as_completed(futures):\n",
    "        i = futures[future]  # Get the original index\n",
    "        issue_match = future.result()\n",
    "        issue_matches_full[i] = (issue_match == 'True')\n",
    "        validation_results.append({\n",
    "            \"index\": i,\n",
    "            \"predicted_issue\": pred_issues[i],\n",
    "            \"true_issue\": true_issues[i],\n",
    "            \"issue_match\": issue_matches_full[i]\n",
    "        })\n",
    "    \n",
    "    # Calculate issue accuracy\n",
    "    issue_accuracy = sum([i['issue_match'] for i in validation_results]) / len(validation_results)\n",
    "    \n",
    "    # Store the results in the dictionary\n",
    "    model_results = {\n",
    "        \"precision\": precision,\n",
    "        \"recall\": recall,\n",
    "        \"f1\": f1,\n",
    "        \"issue_accuracy\": issue_accuracy\n",
    "    }\n",
    "\n",
    "# Create a DataFrame to store the results\n",
    "df_results = pd.DataFrame([model_results])\n",
    "\n",
    "# Create a DataFrame to store the validation results for each row\n",
    "df_validation_results = pd.DataFrame(validation_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we'll display the subset of rows that we correctly identified contained an issue. For each row, we'll show the predicted vs. true issue and whether or not there is a match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>predicted_issue</th>\n",
       "      <th>true_issue</th>\n",
       "      <th>issue_match</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>39</td>\n",
       "      <td>Amoxicillin is prescribed to a patient with Penicillin allergy.</td>\n",
       "      <td>Prescribed Amoxicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>Patient diagnosed with Type 1 Diabetes is not on any medications and the treatment field lists the diagnosis instead of appropriate treatment.</td>\n",
       "      <td>Diabetes Type 1 patient not receiving insulin</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>51</td>\n",
       "      <td>Lab result of 300 indicates hyperglycemia but no diagnosis or treatment is recorded.</td>\n",
       "      <td>Extremely high glucose level not diagnosed or treated</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>26</td>\n",
       "      <td>The patient is being prescribed penicillin despite having an allergy to penicillin.</td>\n",
       "      <td>Prescribed Penicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>31</td>\n",
       "      <td>The patient's age (88) is inconsistent with the date of birth (1996-11-05).</td>\n",
       "      <td>Osteoporosis patient not receiving treatment</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>24</td>\n",
       "      <td>The 'Treatment Plan' field should not be 'Depression'; it should specify the treatment prescribed for depression.</td>\n",
       "      <td>Depression patient not receiving treatment</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3</td>\n",
       "      <td>Patient is allergic to Penicillin but is prescribed Amoxicillin.</td>\n",
       "      <td>Prescribed Amoxicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>28</td>\n",
       "      <td>The treatment field contains 'Asthma', which is a diagnosis, not a treatment.</td>\n",
       "      <td>Asthma patient not prescribed any medication</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>7</td>\n",
       "      <td>Patient with asthma and low lab result (100) is treated only with lifestyle modifications without medications, which is inappropriate.</td>\n",
       "      <td>Asthma patient not prescribed any medication</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>16</td>\n",
       "      <td>The patient's age (86) does not match the date of birth (1955-10-10).</td>\n",
       "      <td>COPD patient not receiving treatment</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>53</td>\n",
       "      <td>The age provided (92) is inconsistent with the date of birth (1983-08-19).</td>\n",
       "      <td>Depression patient not receiving treatment</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>23</td>\n",
       "      <td>Treatment field incorrectly lists 'Hyperlipidemia' instead of an appropriate treatment for the diagnosis.</td>\n",
       "      <td>Hyperlipidemia patient not prescribed any medication</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>Patient is allergic to sulfa drugs but is prescribed Sulfamethoxazole, which is a sulfa drug.</td>\n",
       "      <td>Prescribed Sulfa drug despite Sulfa allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>98</td>\n",
       "      <td>The patient is prescribed Penicillin despite having a Penicillin allergy.</td>\n",
       "      <td>Prescribed Penicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>9</td>\n",
       "      <td>Patient has a medication allergy to Penicillin but is prescribed Penicillin.</td>\n",
       "      <td>Prescribed Penicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>85</td>\n",
       "      <td>Treatment field contains 'Hyperlipidemia', which is a diagnosis, not a treatment.</td>\n",
       "      <td>Hyperlipidemia patient not prescribed any medication</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>18</td>\n",
       "      <td>Prescribed treatment (Aspirin) is not appropriate for the diagnosis of infection.</td>\n",
       "      <td>Prescribed Aspirin despite Aspirin allergy; high glucose level not addressed</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>70</td>\n",
       "      <td>Treatment field contains a diagnosis 'Osteoporosis' instead of a treatment.</td>\n",
       "      <td>Osteoporosis patient not receiving treatment</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>57</td>\n",
       "      <td>Patient is allergic to Penicillin but is being prescribed Amoxicillin, which is contraindicated.</td>\n",
       "      <td>Prescribed Amoxicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>80</td>\n",
       "      <td>Treatment field incorrectly lists 'Diabetes Type 2' instead of a valid treatment plan.</td>\n",
       "      <td>Diabetes Type 2 patient not receiving medication</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>87</td>\n",
       "      <td>Treatment plan includes prescribing Amoxicillin, which the patient is allergic to.</td>\n",
       "      <td>Prescribed Amoxicillin despite Penicillin allergy</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>37</td>\n",
       "      <td>Treatment field contains 'Hyperlipidemia', which is a diagnosis, not a treatment.</td>\n",
       "      <td>Hyperlipidemia patient not prescribed any medication</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>95</td>\n",
       "      <td>Treatment is listed as 'Asthma', which is not an appropriate treatment for the diagnosis.</td>\n",
       "      <td>Asthma patient not prescribed any medication</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>96</td>\n",
       "      <td>Treatment field lists 'Hyperlipidemia', which is not an appropriate treatment.</td>\n",
       "      <td>Hyperlipidemia patient not prescribed any medication</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>59</td>\n",
       "      <td>Treatment field contains 'Anemia', which is not a valid treatment.</td>\n",
       "      <td>Anemia patient not receiving treatment</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>5</td>\n",
       "      <td>Age does not match date of birth</td>\n",
       "      <td>Low glucose level not properly addressed</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def display_formatted_dataframe(df):\n",
    "    def format_text(text):\n",
    "        return text.replace('\\n', '<br>')\n",
    "\n",
    "    df_formatted = df.copy()\n",
    "    df_formatted['predicted_issue'] = df_formatted['predicted_issue'].apply(format_text)\n",
    "    df_formatted['true_issue'] = df_formatted['true_issue'].apply(format_text)\n",
    "    \n",
    "    display(HTML(df_formatted.to_html(escape=False, justify='left')))\n",
    "    \n",
    "display_formatted_dataframe(pd.DataFrame(validation_results))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   precision    recall       f1  issue_accuracy\n",
      "0   0.818182  0.870968  0.84375        0.615385\n"
     ]
    }
   ],
   "source": [
    "# Display the DataFrame\n",
    "print(df_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "We can see from the results here that we're able to generate a high precision/recall for issue identification as well as decent accuracy for pinpointing the exact issue in the data.\n",
    "\n",
    "This should help streamline data validation for eval sets across a variety of domains."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}