Generate a spec-compliant/llms.txt file for any website. Create an AI sitemap so ChatGPT, Claude, and Perplexity can understand your site. Free, no signup, powered by
The generator is free for up to 100 pages. We're exploring a pro tier for teams and larger sites.
This generator uses
Discovers pages from your sitemap, internal links, and common paths.
Extracts content from each page and converts HTML to clean Markdown using mdream.
Groups pages into logical sections based on URL patterns and site structure.
Produces a spec-compliant llms.txt with proper headings, links, and descriptions.
AI tools are how a growing number of people discover and interact with your content. A well-structured llms.txt helps them get accurate answers from your site.
ChatGPT, Perplexity, Claude Browse, and Google AI Overviews use structured data to cite your content accurately. An llms.txt acts like an AI sitemap for your site.
Developers using AI coding assistants get better answers when your docs have a clear llms.txt index. Essential for open-source projects and API documentation.
Teams building RAG pipelines use llms.txt and llms-full.txt to populate vector databases. A well-structured file means better retrieval quality.
robots.txt controls crawl access (exclusion). llms.txt provides context and structure for the content you do want shared (inclusion). Use both together.
CurateEdit the generated file. Remove low-value pages. The median file across 95+ projects is 16 KB with 94 links. A focused file under 50 KB performs best.
DescribeAdd descriptions after each link. Only 47% of links in the wild include descriptions, but they dramatically improve LLM comprehension.
SectionGroup links under meaningful H2 headings. The most common across 95+ files: "Docs", "Guides", "Reference", "Examples".
OptionalAdd an "Optional" section for lower-priority links. Only 12% of files use this pattern, but it lets LLMs skip non-essential content when context is limited.
ValidateRun your file through the llms.txt validator to catch formatting issues. 27% of files in the wild violate the spec with H3+ headings.
It crawls your website, extracts page titles and descriptions, converts HTML to clean Markdown, organizes pages into sections based on your URL structure, and produces a spec-compliant llms.txt file that AI systems like ChatGPT, Claude, and Perplexity can consume.
Most sites complete in under 30 seconds. Larger sites with hundreds of pages may take a minute or two. The progress indicator shows pages found and processed in real time.
Place it at the root of your domain so it's accessible at https://yourdomain.com/llms.txt. For WordPress, add it to your root directory or use a plugin. For Next.js, put it in public/ or create a route handler. For Nuxt, use @mdream/nuxt for automatic generation or place it in public/llms.txt.
Absolutely. The generated file is a starting point. Review it, reorder sections, add descriptions to links, and remove pages that aren't useful for AI context. A curated llms.txt with 20 to 50 focused links performs better than a massive auto-generated list.
This tool generates llms.txt (the concise index with links). For llms-full.txt (which embeds full page content inline for RAG pipelines and "chat with docs" applications), use the @mdream/crawl package or the @mdream/nuxt module.
No. The crawl runs server-side, generates the llms.txt output, and discards all data after delivery. Nothing is stored or logged.
robots.txt controls which pages crawlers can access (exclusion). Sitemaps list URLs for search engines to index. llms.txt provides structured context for AI systems (inclusion): which pages matter, how they relate, and what each one covers. Think of llms.txt as an "AI sitemap" or "robots.txt for AI." Use all three together for complete discoverability.
Yes. AI search engines like Perplexity, ChatGPT Browse, and Google AI Overviews use structured data to cite your content. A well-structured llms.txt acts as an AI sitemap, making it easier for these tools to find, understand, and accurately reference your pages. This is part of what some call Generative Engine Optimization (GEO).
For WordPress, place a static llms.txt in your site root or use a plugin that generates it from your posts and pages. For Next.js, add it to public/llms.txt for a static file, or create a route handler at app/llms.txt/route.ts that generates it dynamically from your content. This generator works with any site regardless of platform.
llms.txt is an excellent starting point for RAG (Retrieval Augmented Generation) pipelines. Use llms.txt as a structured index, or use llms-full.txt (which embeds all content inline) as a retrieval source for vector databases. The section structure maps naturally to document chunks for embedding.