GPT-4.1 Guide updates (#1776)

2025-05-09 19:32:38 +00:00 · 2025-04-15 10:46:26 -07:00 · 2025-04-15 10:46:26 -07:00 · dc8f255a40
commit dc8f255a40
parent acfa8abce9
2 changed files with 154 additions and 47 deletions
--- a/examples/gpt4-1_prompting_guide.ipynb
+++ b/examples/gpt4-1_prompting_guide.ipynb
@ -63,10 +63,42 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[{'id': 'msg_67fe92df26ac819182ffafce9ff4e4fc07c7e06242e51f8b',\n",
+       "  'content': [{'annotations': [],\n",
+       "    'text': \"Thank you for the report, but “Typerror” is too vague for me to start debugging right away.\\n\\n**To make progress, I need to:**\\n1. Find the exact error message text (e.g. `'TypeError: ...'`).\\n2. Find which file and which line/function/class the error occurred in.\\n3. Figure out what triggered the error (test file, usage, reproduction steps).\\n4. Find the root cause and details.\\n\\n**Next steps:**\\n- Investigate error/log/test output files for a Python `TypeError` message.\\n- Examine the relevant code sections for problematic type usage.\\n- If possible, reproduce the bug locally.\\n\\n**Plan:**\\n- First, I will search for test files and log output in the `/testbed` directory that may contain the full error message and stack trace.\\n\\nLet’s start by listing the contents of the `/testbed` directory to look for clues.\",\n",
+       "    'type': 'output_text'}],\n",
+       "  'role': 'assistant',\n",
+       "  'status': 'completed',\n",
+       "  'type': 'message'},\n",
+       " {'arguments': '{\"input\":\"!ls -l /testbed\"}',\n",
+       "  'call_id': 'call_frnxyJgKi5TsBem0nR9Zuzdw',\n",
+       "  'name': 'python',\n",
+       "  'type': 'function_call',\n",
+       "  'id': 'fc_67fe92e3da7081918fc18d5c96dddc1c07c7e06242e51f8b',\n",
+       "  'status': 'completed'}]"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
+    "from openai import OpenAI\n",
+    "import os\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    api_key=os.environ.get(\n",
+    "        \"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"\n",
+    "    )\n",
+    ")\n",
+    "\n",
    "SYS_PROMPT_SWEBENCH = \"\"\"\n",
    "You will be tasked to fix an issue from an open-source repository.\n",
    "\n",
@ -145,7 +177,98 @@
    "- Run these new tests and ensure they all pass.\n",
    "- Be aware that there are additional hidden tests that must also pass for the solution to be successful.\n",
    "- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.\n",
-    "\"\"\""
+    "\"\"\"\n",
+    "\n",
+    "PYTHON_TOOL_DESCRIPTION = \"\"\"This function is used to execute Python code or terminal commands in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0 seconds. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail. Just as in a Jupyter notebook, you may also execute terminal commands by calling this function with a terminal command, prefaced with an exclamation mark.\n",
+    "\n",
+    "In addition, for the purposes of this task, you can call this function with an `apply_patch` command as input.  `apply_patch` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the `apply_patch` command, you should pass a message of the following structure as \"input\":\n",
+    "\n",
+    "%%bash\n",
+    "apply_patch <<\"EOF\"\n",
+    "*** Begin Patch\n",
+    "[YOUR_PATCH]\n",
+    "*** End Patch\n",
+    "EOF\n",
+    "\n",
+    "Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.\n",
+    "\n",
+    "*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.\n",
+    "For each snippet of code that needs to be changed, repeat the following:\n",
+    "[context_before] -> See below for further instructions on context.\n",
+    "- [old_code] -> Precede the old code with a minus sign.\n",
+    "+ [new_code] -> Precede the new, replacement code with a plus sign.\n",
+    "[context_after] -> See below for further instructions on context.\n",
+    "\n",
+    "For instructions on [context_before] and [context_after]:\n",
+    "- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change's [context_after] lines in the second change's [context_before] lines.\n",
+    "- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:\n",
+    "@@ class BaseClass\n",
+    "[3 lines of pre-context]\n",
+    "- [old_code]\n",
+    "+ [new_code]\n",
+    "[3 lines of post-context]\n",
+    "\n",
+    "- If a code block is repeated so many times in a class or function such that even a single @@ statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:\n",
+    "\n",
+    "@@ class BaseClass\n",
+    "@@ \tdef method():\n",
+    "[3 lines of pre-context]\n",
+    "- [old_code]\n",
+    "+ [new_code]\n",
+    "[3 lines of post-context]\n",
+    "\n",
+    "Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as \"input\" to this function, in order to apply a patch, is shown below.\n",
+    "\n",
+    "%%bash\n",
+    "apply_patch <<\"EOF\"\n",
+    "*** Begin Patch\n",
+    "*** Update File: pygorithm/searching/binary_search.py\n",
+    "@@ class BaseClass\n",
+    "@@     def search():\n",
+    "-        pass\n",
+    "+        raise NotImplementedError()\n",
+    "\n",
+    "@@ class Subclass\n",
+    "@@     def search():\n",
+    "-        pass\n",
+    "+        raise NotImplementedError()\n",
+    "\n",
+    "*** End Patch\n",
+    "EOF\n",
+    "\n",
+    "File references can only be relative, NEVER ABSOLUTE. After the apply_patch command is run, python will always say \"Done!\", regardless of whether the patch was successfully applied or not. However, you can determine if there are issue and errors by looking at any warnings or logging lines printed BEFORE the \"Done!\" is output.\n",
+    "\"\"\"\n",
+    "\n",
+    "python_bash_patch_tool = {\n",
+    "  \"type\": \"function\",\n",
+    "  \"name\": \"python\",\n",
+    "  \"description\": PYTHON_TOOL_DESCRIPTION,\n",
+    "  \"parameters\": {\n",
+    "      \"type\": \"object\",\n",
+    "      \"strict\": True,\n",
+    "      \"properties\": {\n",
+    "          \"input\": {\n",
+    "              \"type\": \"string\",\n",
+    "              \"description\": \" The Python code, terminal command (prefaced by exclamation mark), or apply_patch command that you wish to execute.\",\n",
+    "          }\n",
+    "      },\n",
+    "      \"required\": [\"input\"],\n",
+    "  },\n",
+    "}\n",
+    "\n",
+    "# Additional harness setup:\n",
+    "# - Add your repo to /testbed\n",
+    "# - Add your issue to the first user message\n",
+    "# - Note: Even though we used a single tool for python, bash, and apply_patch, we generally recommend defining more granular tools that are focused on a single function\n",
+    "\n",
+    "response = client.responses.create(\n",
+    "    instructions=SYS_PROMPT_SWEBENCH,\n",
+    "    model=\"gpt-4.1-2025-04-14\",\n",
+    "    tools=[python_bash_patch_tool],\n",
+    "    input=f\"Please answer the following question:\\nBug: Typerror...\"\n",
+    ")\n",
+    "\n",
+    "response.to_dict()[\"output\"]"
   ]
  },
  {
@ -243,7 +366,7 @@
    "4. If behavior still isn’t working as expected:  \n",
    "   1. Check for conflicting, underspecified, or wrong instructions and examples. If there are conflicting instructions, GPT-4.1 tends to follow the one closer to the end of the prompt.\n",
    "   2. Add examples that demonstrate desired behavior; ensure that any important behavior demonstrated in your examples are also cited in your rules.\n",
-    "   3. It’s generally not necessary to use all-caps or other incentives like bribes or tips, but developers can experiment with this for extra emphasis if so desired.\n",
+    "   3. It’s generally not necessary to use all-caps or other incentives like bribes or tips. We recommend starting without these, and only reaching for these if necessary for your particular prompt. Note that if your existing prompts include these techniques, it could cause GPT-4.1 to pay attention to it too strictly.\n",
    "\n",
    "*Note that using your preferred AI-powered IDE can be very helpful for iterating on prompts, including checking for consistency or conflicts, adding examples, or making cohesive updates like adding an instruction and updating instructions to demonstrate that instruction.*\n",
    "\n",
@ -269,36 +392,33 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[{'id': 'msg_67fc8e208ce88191a8511546572ef8a502938f4dcc4628bb',\n",
+       "[{'id': 'msg_67fe92d431548191b7ca6cd604b4784b06efc5beb16b3c5e',\n",
       "  'content': [{'annotations': [],\n",
-       "    'text': \"Hi, you've reached NewTelco, how can I help you? 😊💬\\n\\nYou’d like to know why your last bill was higher than expected. To help you with that, I'll just need to verify your account information. Could you please provide your phone number associated with your account? 📱\",\n",
+       "    'text': \"Hi, you've reached NewTelco, how can I help you? 🌍✈️\\n\\nYou'd like to know the cost of international service while traveling to France. 🇫🇷 Let me check the latest details for you—one moment, please. 🕑\",\n",
       "    'type': 'output_text'}],\n",
       "  'role': 'assistant',\n",
       "  'status': 'completed',\n",
-       "  'type': 'message'}]"
+       "  'type': 'message'},\n",
+       " {'arguments': '{\"topic\":\"international service cost France\"}',\n",
+       "  'call_id': 'call_cF63DLeyhNhwfdyME3ZHd0yo',\n",
+       "  'name': 'lookup_policy_document',\n",
+       "  'type': 'function_call',\n",
+       "  'id': 'fc_67fe92d5d6888191b6cd7cf57f707e4606efc5beb16b3c5e',\n",
+       "  'status': 'completed'}]"
      ]
     },
-     "execution_count": 1,
+     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "from openai import OpenAI\n",
-    "import os\n",
-    "\n",
-    "client = OpenAI(\n",
-    "    api_key=os.environ.get(\n",
-    "        \"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"\n",
-    "    )\n",
-    ")\n",
-    "\n",
    "SYS_PROMPT_CUSTOMER_SERVICE = \"\"\"You are a helpful customer service agent working for NewTelco, helping a user efficiently fulfill their request while adhering closely to provided guidelines.\n",
    "\n",
    "# Instructions\n",
@ -344,31 +464,18 @@
    "## User\n",
    "Can you tell me about your family plan options?\n",
    "\n",
-    "## Assistant\n",
-    "```\n",
-    "{\n",
-    "  \"role\": \"assistant\",\n",
-    "  \"content\": \"Hi, you've reached NewTelco, how can I help you? 😊🎉\\n\\nYou'd like to know about our family plan options. 🤝 Let me check that for you—one moment, please. 🚀\",\n",
-    "  \"tool_calls\": [\n",
-    "    {\n",
-    "      \"id\": \"call-1\",\n",
-    "      \"type\": \"function\",\n",
-    "      \"function\": {\n",
-    "        \"name\": \"lookup_policy_document\",\n",
-    "        \"arguments\": \"{\\\"topic\\\": \\\"family plan options\\\"}\"\n",
-    "      }\n",
-    "    }\n",
-    "  ]\n",
-    "}\n",
-    "```\n",
+    "## Assistant Response 1\n",
+    "### Message\n",
+    "\"Hi, you've reached NewTelco, how can I help you? 😊🎉\\n\\nYou'd like to know about our family plan options. 🤝 Let me check that for you—one moment, please. 🚀\"\n",
+    "\n",
+    "### Tool Calls\n",
+    "lookup_policy_document(topic=\"family plan options\")\n",
    "\n",
    "// After tool call, the assistant would follow up with:\n",
    "\n",
-    "{\n",
-    "  \"role\": \"assistant\",\n",
-    "  \"content\": \"Okay, here's what I found: 🎉 Our family plan allows up to 5 lines with shared data and a 10% discount for each additional line [Family Plan Policy](ID-010). 📱 Is there anything else I can help you with today? 😊\"\n",
-    "}\n",
-    "```\n",
+    "## Assistant Response 2 (after tool call)\n",
+    "### Message\n",
+    "\"Okay, here's what I found: 🎉 Our family plan allows up to 5 lines with shared data and a 10% discount for each additional line [Family Plan Policy](ID-010). 📱 Is there anything else I can help you with today? 😊\"\n",
    "\"\"\"\n",
    "\n",
    "get_policy_doc = {\n",
@ -411,8 +518,8 @@
    "    instructions=SYS_PROMPT_CUSTOMER_SERVICE,\n",
    "    model=\"gpt-4.1-2025-04-14\",\n",
    "    tools=[get_policy_doc, get_user_acct],\n",
-    "    input=\"Why was my last bill so high?\"\n",
-    "    # input=\"How much will it cost for international service? I'm traveling to France.\",\n",
+    "    input=\"How much will it cost for international service? I'm traveling to France.\",\n",
+    "    # input=\"Why was my last bill so high?\"\n",
    ")\n",
    "\n",
    "response.to_dict()[\"output\"]"
@ -467,7 +574,7 @@
    "\n",
    "3. JSON is highly structured and well understood by the model particularly in coding contexts. However it can be more verbose, and require character escaping that can add overhead.\n",
    "\n",
-    "Guidance specifically for adding a large number of documents or files to context:\n",
+    "Guidance specifically for adding a large number of documents or files to input context:\n",
    "\n",
    "* XML performed well in our long context testing.  \n",
    "  * Example: `<doc id=1 title=”The Fox”>The quick brown fox jumps over the lazy dog</doc>`  \n",
@ -1178,7 +1285,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
@ -1192,7 +1299,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.9.6"
  }
 },
 "nbformat": 4,
--- a/registry.yaml
+++ b/registry.yaml
@ -1906,7 +1906,7 @@
    - gpt-actions-library
    - chatgpt-productivity    

- title: GPT 4.1 Prompting Guide
+- title: GPT-4.1 Prompting Guide
  path: examples/gpt4-1_prompting_guide.ipynb
  date: 2025-04-14
  authors: