All guides
Guide

AI crawlers in 2026

Nine AI crawlers account for virtually all agent-originated traffic in 2026. Blanket-blocking them means no citations from ChatGPT, Claude, Perplexity, or Google AI Overviews. Block selectively with robots.txt and guide the rest with llms.txt.

The crawlers that matter

CrawlerUser-agentPurposeWho it feeds
GPTBotGPTBottrainingOpenAI foundation models
ChatGPT-UserChatGPT-Useron-demand browsingChatGPT
OAI-SearchBotOAI-SearchBotsearch indexSearchGPT
ClaudeBotClaudeBottraining + retrievalAnthropic
Claude-UserClaude-UserClaude Code, live agent fetchesClaude products
anthropic-aianthropic-ailegacyAnthropic
PerplexityBotPerplexityBotretrievalPerplexity
Perplexity-UserPerplexity-Userlive agent fetchesPerplexity
Google-ExtendedGoogle-Extendedtraining opt-outGemini, Vertex
CCBotCCBotCommon Crawleveryone indirectly
BytespiderBytespidertrainingByteDance/Doubao
Applebot-ExtendedApplebot-Extendedtraining opt-outApple Intelligence

Block vs allow

Default guidance in 2026:

  • Allow user-facing agents: ChatGPT-User, Claude-User, Perplexity-User. Blocking them means your content can't be cited live.
  • Allow search-style bots: OAI-SearchBot, PerplexityBot, Google-Extended.
  • Consider blocking training bots: GPTBot, ClaudeBot, CCBot, Bytespider. Your content still shows up in live fetches; you just opt out of future training corpora.

This is a legal and commercial call; pick once, apply sitewide, revisit quarterly.

robots.txt template

# AI — training bots (opt-out unless you have a commercial reason)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: anthropic-ai
Disallow: /

# AI — live agent fetchers (allow, so your content gets cited)
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Default
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap.md

Check which bots actually fetched you

Server logs are authoritative. Grep for:

grep -E 'GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Bytespider|CCBot|Google-Extended|Applebot-Extended' access.log

Pair with llms.txt

robots.txt controls access; llms.txt gives the agents that are allowed in a structured map. .

Check this on your site

AI search visibility audit

One-click audit: llms.txt, Accept-header, robots.txt, sitemap.md, token savings.

Related guides