"# Managing Function Calls With Reasoning Models\n",
"OpenAI now offers function calling using [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses). Reasoning models are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.\n",
"> _Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent._\n",
"All examples in this notebook use the newer [Responses API](https://community.openai.com/t/introducing-the-responses-api/1140929) which provides convenient abstractions for managing conversation state. However the principles here are relevant when using the older chat completions API."
" \"reasoning\": {\"effort\": \"low\", \"summary\": \"auto\"}, # Automatically summarise the reasoning process. Can also choose \"detailed\" or \"none\"\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's make a simple call to a reasoning model using the Responses API.\n",
"We specify a low reasoning effort and retrieve the response with the helpful `output_text` attribute.\n",
"We can ask follow up questions and use the `previous_response_id` to let OpenAI manage the conversation history automatically"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Among the last four Summer Olympic host cities (Beijing 2008, London 2012, Rio de Janeiro 2016 and Tokyo 2020), Rio de Janeiro has by far the highest mean annual temperature—around 23 °C, compared with about 16 °C in Tokyo, 13 °C in Beijing and 11 °C in London.\n",
"Of those four, London has the lowest mean annual temperature, at roughly 11 °C.\n"
]
}
],
"source": [
"response = client.responses.create(\n",
" input=\"Which of the last four Olympic host cities has the highest average temperature?\",\n",
"We're asking relatively complex questions that may require the model to reason out a plan and proceed through it in steps, but this reasoning is hidden from us - we simply wait a little longer before being shown the response. \n",
"However, if we inspect the output we can see that the model has made use of a hidden set of 'reasoning' tokens that were included in the model context window, but not exposed to us as end users.\n",
"The user is asking about the last four Olympic host cities, assuming it’s for the Summer Olympics. Those would be Beijing in 2008, London in 2012, Rio in 2016, and Tokyo in 2020. They’re interested in the lowest average temperature, which I see is London at around 11°C. Beijing is about 13°C, Tokyo 16°C, but London has the lowest. I should clarify it's the mean annual temperature. So, I'll present it neatly that London is the answer.\n"
"It is important to know about these reasoning tokens, because it means we will consume our available context window more quickly than with traditional chat models.\n",
"What happens if we ask the model a complex request that also requires the use of custom tools?\n",
"* Let's imagine we have more questions about Olympic Cities, but we also have an internal database that contains IDs for each city.\n",
"* It's possible that the model will need to invoke our tool partway through its reasoning process before returning a result.\n",
"* Let's make a function that produces a random UUID and ask the model to reason about these UUIDs. \n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"\n",
"def get_city_uuid(city: str) -> str:\n",
" \"\"\"Just a fake tool to return a fake UUID\"\"\"\n",
" uuid = str(uuid4())\n",
" return f\"{city} ID: {uuid}\"\n",
"\n",
"# The tool schema that we will pass to the model\n",
"tools = [\n",
" {\n",
" \"type\": \"function\",\n",
" \"name\": \"get_city_uuid\",\n",
" \"description\": \"Retrieve the internal ID for a city from the internal database. Only invoke this function if the user needs to know the internal ID for a city.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"city\": {\"type\": \"string\", \"description\": \"The name of the city to get information about\"}\n",
" },\n",
" \"required\": [\"city\"]\n",
" }\n",
" }\n",
"]\n",
"\n",
"# This is a general practice - we need a mapping of the tool names we tell the model about, and the functions that implement them.\n",
"tool_mapping = {\n",
" \"get_city_uuid\": get_city_uuid\n",
"}\n",
"\n",
"# Let's add this to our defaults so we don't have to pass it every time\n",
"Along with the reasoning step, the model has successfully identified the need for a tool call and passed back instructions to send to our function call. \n",
" raise ValueError(f\"No tool found for function call: {function_call.name}\")\n",
" arguments = json.loads(function_call.arguments) # Load the arguments as a dictionary\n",
" tool_output = target_tool(**arguments) # Invoke the tool with the arguments\n",
" new_conversation_items.append({\n",
" \"type\": \"function_call_output\",\n",
" \"call_id\": function_call.call_id, # We map the call_id back to the original function call\n",
" \"output\": tool_output\n",
" })"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The internal ID for London is ce863d03-9c01-4de2-9af8-96b123852aec.\n"
]
}
],
"source": [
"response = client.responses.create(\n",
" input=new_conversation_items,\n",
" previous_response_id=response.id,\n",
" **MODEL_DEFAULTS\n",
")\n",
"print(response.output_text)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This works great here - as we know that a single function call is all that is required for the model to respond - but we also need to account for situations where multiple tool calls might need to be executed for the reasoning to complete.\n",
"\n",
"## Executing multiple functions in series\n",
"\n",
"Some OpenAI models support the parameter `parallel_tool_calls` which allows the model to return an array of functions which we can then execute in parallel. However, reasoning models may produce a sequence of function calls that must be made in series, particularly as some steps may depend on the results of previous ones.\n",
"As such, we ought to define a general pattern which we can use to handle arbitrarily complex reasoning workflows:\n",
"* At each step in the conversation, initialise a loop\n",
"* If the response contains function calls, we must assume the reasoning is ongoing and we should feed the function results (and any intermediate reasoning) back into the model for further inference\n",
"* If there are no function calls and we instead receive a Reponse.output with a type of 'message', we can safely assume the agent has finished reasoning and we can break out of the loop"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Let's wrap our logic above into a function which we can use to invoke tool calls.\n",
"initial_question = \"What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, and which cities have IDs beginning with the number '2'. Use your internal tools to look up the IDs?\"\n",
"\n",
"# We fetch a response and then kick off a loop to handle the response\n",
"In these situations we may wish to take full control of the conversation. Rather than using `previous_message_id` we can instead treat the API as 'stateless' and make and maintain an array of conversation items that we send to the model as input each time.\n",
"User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.\n",
"• London (2012): 72ff5a9d-d147-4ba8-9a87-64e3572ba3bc \n",
" – Leading “72” → 72 is not prime \n",
"• Rio de Janeiro (2016): 7a45a392-b43a-41be-8eaf-07ec44d42a2b \n",
" – Leading “7” → 7 is prime \n",
"• Tokyo (2020): f725244f-079f-44e1-a91c-5c31c270c209 \n",
" – Leading “f” → no numeric prefix \n",
"• Paris (2024): b0230ad4-bc35-48be-a198-65a9aaf28fb5 \n",
" – Leading “b” → no numeric prefix \n",
"\n",
"Conclusion: After the update, only Rio de Janeiro’s ID begins with a prime number (“7”).\n",
"Total tokens used: 9734 (4.87% of o4-mini's context window)\n"
]
}
],
"source": [
"# Let's initialise our conversation with the first user message\n",
"total_tokens_used = 0\n",
"user_messages = [\n",
" \"Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.\",\n",
" \"Great thanks! We've just updated the IDs - could you please check again?\"\n",
" # Preserve order of function calls and outputs in case of multiple function calls (currently not supported by reasoning models, but worth considering)\n",
" interleaved = [val for pair in zip(function_calls, function_outputs) for val in pair]\n",
" conversation.extend(interleaved)\n",
" if len(messages) > 0:\n",
" print(response.output_text)\n",
" conversation.extend(messages)\n",
" if len(function_calls) == 0: # No more functions = We're done reasoning and we're ready for the next user message\n",
"In this cookbook, we identified how to combine function calling with OpenAI's reasoning models to demonstrate multi-step tasks that are dependent on external data sources. \n",
"\n",
"Importantly, we covered reasoning-model specific nuances in the function calling process, specifically that:\n",
"* The model may choose to make multiple function calls or reasoning steps in series, and some steps may depend on the results of previous ones\n",
"* We cannot know how many of these steps there will be, so we must process responses with a loop\n",
"* The responses API makes orchestration easy using the `previous_response_id` parameter, but where manual control is needed, it's important to maintain the correct order of conversation item to preserve the 'chain-of-thought'\n",
"\n",
"---\n",
"\n",
"The examples used here are rather simple, but you can imagine how this technique could be extended to more real-world use cases, such as:\n",
"\n",
"* Looking up a customer's transaction history and recent correspondence to determine if they are eligible for a promotional offer\n",
"* Calling recent transaction logs, geolocation data, and device metadata to assess the likelihood of a transaction being fraudulent\n",
"* Reviewing internal HR databases to fetch an employee’s benefits usage, tenure, and recent policy changes to answer personalized HR questions\n",
"* Reading internal dashboards, competitor news feeds, and market analyses to compile a daily executive briefing tailored to their focus areas"