About robots.txt Builder
Build and validate <code>robots.txt</code> files. Add user-agents (Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, etc.), allow / disallow paths, and the sitemap URL. The validator flags syntax errors and warns about common mistakes. Drop the output at <code>/robots.txt</code> at your site root.
What a robots.txt should usually have
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart
# Major engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# AI crawlers — choose to allow or block
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
Sitemap: https://yoursite.com/sitemap-index.xml
The ”*” block sets the default; named user-agents override.
Common patterns
- Block staging from indexing —
User-agent: */Disallow: /(better: serveX-Robots-Tag: noindexheader) - Block private routes —
Disallow: /admin/,/api/,/cart - Allow major engines, block AI training — explicit allow for Googlebot/Bingbot, disallow for GPTBot/ClaudeBot
- Point at sitemap — always include the
Sitemap:directive
Mistakes to avoid
- Robots.txt as a security mechanism — it is not. Anything sensitive needs auth.
- Disallowing CSS/JS — Google rendering needs them. Don’t block
/_next/,/static/,/_astro/, etc. - Sitemap URL relative — must be absolute (
https://...). - Overlapping allow/disallow — most crawlers honor the most specific match; explicit allow wins on tie.
Common workflows
New site launch. Build robots.txt with all major engines allowed, AI training blocked or allowed per policy, sitemap declared.
Audit an existing robots.txt. Paste in to validate. The tool flags unrecognized directives and common errors.
Block staging from indexing. During pre-launch, deploy a Disallow: / robots.txt or (better) serve X-Robots-Tag: noindex headers.
Frequently asked questions
What does robots.txt actually do?
Wildcard behavior?
Disallow: /admin/* matches everything under /admin/. Most crawlers also accept $ for end-of-URL.How do I block AI crawlers?
GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended (training data), CCBot (Common Crawl). Set Disallow: / per agent to opt out.Sitemap declaration?
Sitemap: https://yoursite.com/sitemap.xml line at the bottom (any user-agent block). Pointing search engines at your sitemap.Where does robots.txt go?
/robots.txt. Anywhere else, crawlers ignore it.Does noindex go in robots.txt?
noindex is a meta tag (or X-Robots-Tag header). robots.txt blocks crawl; noindex blocks indexing.Related tools
Last updated: 2025-01-15