Guide

AI crawlers in 2026

Nine AI crawlers account for virtually all agent-originated traffic in 2026. Blanket-blocking them means no citations from ChatGPT, Claude, Perplexity, or Google AI Overviews. Block selectively with robots.txt and guide the rest with llms.txt.

Updated 2026-04-22

The crawlers that matter

Crawler	User-agent	Purpose	Who it feeds
GPTBot	`GPTBot`	training	OpenAI foundation models
ChatGPT-User	`ChatGPT-User`	on-demand browsing	ChatGPT
OAI-SearchBot	`OAI-SearchBot`	search index	SearchGPT
ClaudeBot	`ClaudeBot`	training + retrieval	Anthropic
Claude-User	`Claude-User`	Claude Code, live agent fetches	Claude products
anthropic-ai	`anthropic-ai`	legacy	Anthropic
PerplexityBot	`PerplexityBot`	retrieval	Perplexity
Perplexity-User	`Perplexity-User`	live agent fetches	Perplexity
Google-Extended	`Google-Extended`	training opt-out	Gemini, Vertex
CCBot	`CCBot`	Common Crawl	everyone indirectly
Bytespider	`Bytespider`	training	ByteDance/Doubao
Applebot-Extended	`Applebot-Extended`	training opt-out	Apple Intelligence

Block vs allow

Default guidance in 2026:

Allow user-facing agents: ChatGPT-User, Claude-User, Perplexity-User. Blocking them means your content can't be cited live.
Allow search-style bots: OAI-SearchBot, PerplexityBot, Google-Extended.
Consider blocking training bots: GPTBot, ClaudeBot, CCBot, Bytespider. Your content still shows up in live fetches; you just opt out of future training corpora.

This is a legal and commercial call; pick once, apply sitewide, revisit quarterly.

robots.txt template

# AI — training bots (opt-out unless you have a commercial reason)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: anthropic-ai
Disallow: /

# AI — live agent fetchers (allow, so your content gets cited)
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Default
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap.md

Check which bots actually fetched you

Server logs are authoritative. Grep for:

grep -E 'GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Bytespider|CCBot|Google-Extended|Applebot-Extended' access.log

Pair with llms.txt

robots.txt controls access; llms.txt gives the agents that are allowed in a structured map. Generate yours in seconds.

AI crawlers in 2026

The crawlers that matter

Block vs allow

robots.txt template

Check which bots actually fetched you

Pair with llms.txt

Check this on your site

AI search visibility audit

Related guides

Generative engine optimization (GEO)

Answer engine optimization (AEO)

Markdown for agents