Generate a production-ready robots.txt file with a live preview. Includes an AI bot blocking section (GPTBot, Google-Extended, Anthropic-AI and more), per-bot rules, Disallow/Allow paths, Crawl-delay warnings and sitemap declaration.
Start with the User-agent: * section that applies to all crawlers. Add paths you want to block (Disallow) or explicitly allow (Allow). Common blocks include /admin/, /wp-admin/, /private/, /search?, and staging-environment URLs.
Use the AI Bot Blocking section to prevent GPTBot, Google-Extended, Anthropic-AI, CCBot, PerplexityBot and other AI training crawlers from accessing your content. Add any custom bot rules. Note: crawl-delay is ignored by Googlebot — use Google Search Console to adjust Google's crawl rate.
Enter your XML sitemap URL — this helps search engines discover all your pages. Review the live robots.txt preview, then click Download to save as robots.txt and upload to your domain root so it's accessible at example.com/robots.txt.
Robots.txt is a plain text file at the root of your domain (example.com/robots.txt) that provides instructions to web crawlers about which pages or sections of your site they can access. It uses the Robots Exclusion Protocol standard, supported by all major search engines and many other bots.
Robots.txt blocks crawling — it prevents bots from visiting the URL. It does not prevent indexing. If another site links to a URL you have blocked in robots.txt, Google can still index that URL (it will just not crawl the content). To prevent indexing of a page, use a noindex meta tag on the page — but note Google must be able to crawl the page to see the noindex tag.
No — Googlebot explicitly ignores the Crawl-delay directive. To control how frequently Googlebot crawls your site, use the crawl rate setting in Google Search Console. Crawl-delay is supported by Bingbot, Yandex and some other crawlers, so it still serves a purpose for those bots.
This depends on your content strategy. AI training crawlers like GPTBot (OpenAI), Google-Extended (Google AI), Anthropic-AI, CCBot (Common Crawl) and PerplexityBot use your content to train language models. If you want to prevent your content from being used for AI training while still allowing search indexing, block these specific bots while keeping User-agent: * rules permissive.
Common paths to block: /admin/ or /wp-admin/ (admin panels), /wp-includes/ (WordPress internals), /private/ or /internal/ (confidential content), URL parameters that create duplicate pages (/search?, /?s=, /?ref=, /?utm_), print versions (/print/), session-based URLs, and staging subdomains. Never block your CSS, JavaScript or image files — Googlebot needs these to render your pages correctly.
Robots.txt is lenient — most parsers skip lines they do not understand rather than failing entirely. However, a malformed User-agent line or missing blank line between rule groups can cause all rules to be misinterpreted. Always validate your robots.txt using Google Search Console's robots.txt Tester after making changes.