"# Fine tuning on synthetic function-calling data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook covers how to fine-tune to increase function calling accuracy and reliability.\\\n",
"You can find more information on function calling [here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb), \n",
"and on fine tuning [here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_finetune_chat_models.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For context, from the function calling notebook above:\n",
"> `functions` is an optional parameter in the Chat Completion API which can be used to provide function specifications. The purpose of this is to enable models to generate function arguments which adhere to the provided specifications. Note that the API will not actually execute any function calls. It is up to developers to execute function calls using model outputs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Function calling is a very powerful tool when it functions as intended. However, we have seen that as the number of \\\n",
"functions increases, and the complexity of the task at hand increases, function calling becomes less accurate (e.g.: more hallucinated\n",
"invocations, and incorrect invocations).\\\n",
"Before fine tuning for function calling, it's best to begin with:\n",
"- Improvements to the function definitions. Make them more clear, and more distinct from one another.\n",
"- Experiment with prompt engineering: often a more detailed prompt can help the model call the correct function.\\\n",
"\n",
"*If* the steps above fail to improve function calling to a satisfactory level, then you can try fine tuning for function calling."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook contains three sections\n",
"- **Assessing baseline function calling performance:** Evaluating an out-of-the-box `gpt-3.5-turbo` model on our given function (let's assume that for latency + cost reasons we cannot use `gpt-4` for a drone copilot)\n",
"- **Generating synthetic data:** Using `gpt-4` to create 'golden' set of prompts and function invocations to use as training data\n",
"- **Fine-tuning**: Running the fine tuning job, and evaluating the fine-tuned model\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: *This notebook provides an example of how to create synthetic training data for fine tuning for function calling given just a list of functions. While real-world production test evals are preferable, this method produces strong results and can be used in conjuction with real-world training data.*"
"DRONE_SYSTEM_PROMPT = \"\"\"You are an intelligent AI that controls a drone. Given a command or request from the user,\n",
"call one of your functions to complete the request. If the request cannot be completed by your available functions, call the reject_request function.\n",
"If the request is ambiguous or unclear, reject the request.\"\"\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's define functions for all of the actions the copilot can take."
" \"description\": \"Color for the LED display. Not required if pattern is 'rainbow'.\"\n",
" }\n",
" },\n",
" \"required\": [\"pattern\"]\n",
" }\n",
" },\n",
" {\n",
" \"name\": \"set_home_location\",\n",
" \"description\": \"Set or change the home location for the drone.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"coordinates\": {\n",
" \"type\": \"object\",\n",
" \"description\": \"GPS coordinates for the home location.\"\n",
" }\n",
" },\n",
" \"required\": [\"coordinates\"]\n",
" }\n",
" },\n",
" {\n",
" \"name\": \"reject_request\",\n",
" \"description\": \"Use this function if the request is not possible.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {}\n",
" }\n",
" },\n",
"]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For starters, let's see how function calling performs with some straight forward feasible prompts, and then one obviously impossible request which call the 'reject_request' function."
"Nice! The model performs quite well with these requests. Now let's try some more difficult requests: requests that are *almost* feasible and are drone-related, but that the drone cannot actually do, and the pilot should reject."
"The model here should reject all of these requests, as they are impossible given the functions, however instead the model calls functions that are somewhat related to the request, but incorrect. The model sets the camera to video when asked to begin 'live streaming to social media', and changes the LED's to blue when asked to 'change the paint color'...\\\n",
"<br>\n",
"In this simple case, more prompt engineering may resolve some of these issues, but for the purpose of this example we will demonstrate how fine tuning can be used to improve performance. Additionally, while this case is relatively straightforward, as the number of and complexity of the functions increases, fine tuning becomes more and more impactful."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generating synthetic data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Helper functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to generate every invocation of every function, so that we have\n",
"full coverage of all potential invocations to create synthetic data for. Then, we will use `gpt-4` to come up with prompts that would call each invocation, and we will use that prompt - function invocation pair as training data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generating every invocation for a function with fixed enums is more simple, but for a function such as\n",
" `control_gimbal` we need to set the `tilt` and `pan` integer values, so to generate those synthetic invocations we will first set a placeholder, and then later use `gpt-4` to come up with reasonable values."
"1) Input reasonable values for 'fill_in_string' and 'fill_in_int' in the invocation here: {invocation}. Reasonable values are determined by the function definition. Use the\n",
"the entire function provided here :{function} to get context over what proper fill_in_string and fill_in_int values would be.\n",
"Example:\n",
"\n",
"Input: invocation: {{\n",
" \"name\": \"control_camera\",\n",
" \"arguments\": {{\n",
" \"mode\":\"video\",\n",
" \"duration\":\"fill_in_int\"\n",
" }}\n",
"}},\n",
"function:{function}\n",
"\n",
"Output: invocation: {{\n",
" \"name\": \"control_camera\",\n",
" \"arguments\": {{\n",
" \"mode\":\"video\",\n",
" \"duration\": 30\n",
" }}\n",
"}}\n",
"\n",
"\n",
"MAKE SURE output is just a dictionary with keys 'name' and 'arguments', no other text or response.\n",
"\n",
"Input: {invocation}\n",
"Output:\n",
"\"\"\"\n",
"\n",
"\n",
"COMMAND_GENERATION_PROMPT= \"\"\"\n",
"You are to output 2 commands, questions or statements that would generate the inputted function and parameters.\n",
"Please make the commands or questions natural, as a person would ask, and the command or questions should be varied and not repetitive.\n",
"It should not always mirror the exact technical terminology used in the function and parameters, rather reflect a conversational and intuitive request.\n",
"For instance, the prompt should not be 'turn on the dome light', as that is too technical, but rather 'turn on the inside lights'.\n",
"Another example, is the prompt should not be 'turn on the HVAC', but rather 'turn on the air conditioning'. Use language a normal driver would use, even if\n",
"it is technically incorrect but colloquially used.\n",
"\n",
"RULES: ALWAYS put a backwards slash before an apostrophe or single quote '. For example, do not say don't but say don\\'t.\n",
"Prompt: [\"OK, I want to take back pilot control now\",\"Turn off the automatic pilot I'm ready control it\"]\n",
"\n",
"Input: {invocation}\n",
"Prompt:\n",
"\"\"\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the below snippet, we generate the invocation of each function except for the rejection_request function.\\\n",
"To perform effective fine-tuning we need correctly labeled data. We could manually come up with examples and label the data,\\\n",
"or we can generate synthetic data with the help of `gpt-4` <br>\n",
"Empirically, `gpt-4` needs a bit more help to get good realistic examples of prompts that would generate the reject_request function, so we'll do that next..."
"Now let's format the training examples properly. For more documentation on the proper training data formatting for fine tuning for function calling, see here: https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples\n"
"Now, back to the rejection function. Let's generate some prompts that are *nearly* possible, but should result in the `decline_request` function being called. To do so, we queried `gpt-4` asking for requests that are related to, but not quite possible with, the given list of functions. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"reject_list = ['Translate broadcast message to another language',\n",
"'Automatically capture photos when face is detected',\n",
"'Detect nearby drones',\n",
"'Measure wind resistance',\n",
"'Capture slow motion video',\n",
"\"Adjust drone's altitude to ground level changes\",\n",
"'Display custom message on LED display',\n",
"\"Sync drone's time with smartphone\",\n",
"'Alert when drone travels out of designated area',\n",
"'Detect moisture levels',\n",
"'Automatically follow GPS tagged object',\n",
"'Toggle night vision mode',\n",
"'Maintain current altitude when battery is low',\n",
"'Decide best landing spot using AI',\n",
"\"Program drone's route based on wind direction\"]\n"