AI crawlers in 2026
Nine AI crawlers account for virtually all agent-originated traffic in 2026. Blanket-blocking them means no citations from ChatGPT, Claude, Perplexity, or Google AI Overviews. Block selectively with robots.txt and guide the rest with llms.txt.
The crawlers that matter
| Crawler | User-agent | Purpose | Who it feeds |
|---|---|---|---|
| GPTBot | GPTBot | training | OpenAI foundation models |
| ChatGPT-User | ChatGPT-User | on-demand browsing | ChatGPT |
| OAI-SearchBot | OAI-SearchBot | search index | SearchGPT |
| ClaudeBot | ClaudeBot | training + retrieval | Anthropic |
| Claude-User | Claude-User | Claude Code, live agent fetches | Claude products |
| anthropic-ai | anthropic-ai | legacy | Anthropic |
| PerplexityBot | PerplexityBot | retrieval | Perplexity |
| Perplexity-User | Perplexity-User | live agent fetches | Perplexity |
| Google-Extended | Google-Extended | training opt-out | Gemini, Vertex |
| CCBot | CCBot | Common Crawl | everyone indirectly |
| Bytespider | Bytespider | training | ByteDance/Doubao |
| Applebot-Extended | Applebot-Extended | training opt-out | Apple Intelligence |
Block vs allow
Default guidance in 2026:
- Allow user-facing agents: ChatGPT-User, Claude-User, Perplexity-User. Blocking them means your content can't be cited live.
- Allow search-style bots: OAI-SearchBot, PerplexityBot, Google-Extended.
- Consider blocking training bots: GPTBot, ClaudeBot, CCBot, Bytespider. Your content still shows up in live fetches; you just opt out of future training corpora.
This is a legal and commercial call; pick once, apply sitewide, revisit quarterly.
robots.txt template
# AI — training bots (opt-out unless you have a commercial reason)
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: anthropic-ai
Disallow: /
# AI — live agent fetchers (allow, so your content gets cited)
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
# Default
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap.md
Check which bots actually fetched you
Server logs are authoritative. Grep for:
grep -E 'GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Bytespider|CCBot|Google-Extended|Applebot-Extended' access.log
Pair with llms.txt
robots.txt controls access; llms.txt gives the agents that are allowed in a structured map.