40 lines
1.7 KiB
Plaintext
Raw Normal View History

# GPTBot
GPTBot is OpenAIs web crawler and can be identified by the following [user agent](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) and string.
```
User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
```
## Usage
Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to primarily aggregate personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.
### Disallowing GPTBot
To disallow GPTBot to access your site you can add the GPTBot to your sites robots.txt:
```
User-agent: GPTBot
Disallow: /
```
### Customize GPTBot access
To allow GPTBot to access only parts of your site you can add the GPTBot token to your sites robots.txt like this:
```
User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/
```
### GPTBot and ChatGPT-User
OpenAI has two separate user agents for web crawling and user browsing, so you know which use-case a given request is for. Our opt-out system currently treats both user agents the same, so any robots.txt disallow for one agent will cover both. [Read more about ChatGPT-User here](https://platform.openai.com/docs/plugins/bot).
### IP egress ranges
For OpenAI's crawler, calls to websites will be made from the IP address block documented on the [OpenAI website](https://openai.com/gptbot.json).