Robots.txt Generator

📖How to Use the Robots.txt Generator

1
Configure your global rules
Start with the User-agent: * section that applies to all crawlers. Add paths you want to block (Disallow) or explicitly allow (Allow). Common blocks include /admin/, /wp-admin/, /private/, /search?, and staging-environment URLs.
2
Set bot-specific rules and block AI crawlers
Use the AI Bot Blocking section to prevent GPTBot, Google-Extended, Anthropic-AI, CCBot, PerplexityBot and other AI training crawlers from accessing your content. Add any custom bot rules. Note: crawl-delay is ignored by Googlebot — use Google Search Console to adjust Google's crawl rate.
3
Add your sitemap URL and download
Enter your XML sitemap URL — this helps search engines discover all your pages. Review the live robots.txt preview, then click Download to save as robots.txt and upload to your domain root so it's accessible at example.com/robots.txt.

🔑Quick Reference

DirectiveGooglebot support

User-agent✅ Yes

Disallow / Allow✅ Yes

Sitemap✅ Yes

Crawl-delay❌ Ignored by Google

Frequently Asked Questions

What is a robots.txt file?

Robots.txt is a plain text file at the root of your domain (example.com/robots.txt) that provides instructions to web crawlers about which pages or sections of your site they can access. It uses the Robots Exclusion Protocol standard, supported by all major search engines and many other bots.

Does blocking in robots.txt prevent indexing?

Robots.txt blocks crawling — it prevents bots from visiting the URL. It does not prevent indexing. If another site links to a URL you have blocked in robots.txt, Google can still index that URL (it will just not crawl the content). To prevent indexing of a page, use a noindex meta tag on the page — but note Google must be able to crawl the page to see the noindex tag.

Does Googlebot support crawl-delay?

No — Googlebot explicitly ignores the Crawl-delay directive. To control how frequently Googlebot crawls your site, use the crawl rate setting in Google Search Console. Crawl-delay is supported by Bingbot, Yandex and some other crawlers, so it still serves a purpose for those bots.

Should I block AI training bots?

This depends on your content strategy. AI training crawlers like GPTBot (OpenAI), Google-Extended (Google AI), Anthropic-AI, CCBot (Common Crawl) and PerplexityBot use your content to train language models. If you want to prevent your content from being used for AI training while still allowing search indexing, block these specific bots while keeping User-agent: * rules permissive.

What paths should most websites block?

Common paths to block: /admin/ or /wp-admin/ (admin panels), /wp-includes/ (WordPress internals), /private/ or /internal/ (confidential content), URL parameters that create duplicate pages (/search?, /?s=, /?ref=, /?utm_), print versions (/print/), session-based URLs, and staging subdomains. Never block your CSS, JavaScript or image files — Googlebot needs these to render your pages correctly.

What happens if my robots.txt has syntax errors?

Robots.txt is lenient — most parsers skip lines they do not understand rather than failing entirely. However, a malformed User-agent line or missing blank line between rule groups can cause all rules to be misinterpreted. Always validate your robots.txt using Google Search Console's robots.txt Tester after making changes.

📖How to Use the Robots.txt Generator

🔑Quick Reference

Frequently Asked Questions

Related Tools