robots.txt Builder & Validator — SEO Crawler Control

Guide

About robots.txt Builder

Build and validate <code>robots.txt</code> files. Add user-agents (Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, etc.), allow / disallow paths, and the sitemap URL. The validator flags syntax errors and warns about common mistakes. Drop the output at <code>/robots.txt</code> at your site root.

What a robots.txt should usually have

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart

# Major engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI crawlers — choose to allow or block
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap-index.xml

The ”*” block sets the default; named user-agents override.

Common patterns

Block staging from indexing — User-agent: * / Disallow: / (better: serve X-Robots-Tag: noindex header)
Block private routes — Disallow: /admin/, /api/, /cart
Allow major engines, block AI training — explicit allow for Googlebot/Bingbot, disallow for GPTBot/ClaudeBot
Point at sitemap — always include the Sitemap: directive

Mistakes to avoid

Robots.txt as a security mechanism — it is not. Anything sensitive needs auth.
Disallowing CSS/JS — Google rendering needs them. Don’t block /_next/, /static/, /_astro/, etc.
Sitemap URL relative — must be absolute (https://...).
Overlapping allow/disallow — most crawlers honor the most specific match; explicit allow wins on tie.

Common workflows

New site launch. Build robots.txt with all major engines allowed, AI training blocked or allowed per policy, sitemap declared.

Audit an existing robots.txt. Paste in to validate. The tool flags unrecognized directives and common errors.

Block staging from indexing. During pre-launch, deploy a Disallow: / robots.txt or (better) serve X-Robots-Tag: noindex headers.

Frequently asked questions

What does robots.txt actually do?

A polite request to crawlers. Compliant bots (Google, Bing, most AI crawlers) honor it. Malicious crawlers ignore it. For hard blocks, use authentication or rate-limiting.

Wildcard behavior?

Disallow: /admin/* matches everything under /admin/. Most crawlers also accept $ for end-of-URL.

How do I block AI crawlers?

User-agent matches: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended (training data), CCBot (Common Crawl). Set Disallow: / per agent to opt out.

Sitemap declaration?

Add a Sitemap: https://yoursite.com/sitemap.xml line at the bottom (any user-agent block). Pointing search engines at your sitemap.

Where does robots.txt go?

Always at the site root: /robots.txt. Anywhere else, crawlers ignore it.

Does noindex go in robots.txt?

No — noindex is a meta tag (or X-Robots-Tag header). robots.txt blocks crawl; noindex blocks indexing.

Related tools

Last updated: 2025-01-15