? robots.txt Tester - Visiblytics
Home Website & Domain Tools robots.txt Tester
🤖
Web

robots.txt Tester

Test your robots.txt rules against any user-agent and URL. Validate Allow/Disallow directives, spot errors and see exactly which pages are blocked or permitted for crawling.

🤖 All user-agents✅ Allow & Disallow⚡ Instant validation🔒 Paste or fetch live
robots.txt Content
🧪 Test a URL
📋 Bulk URL Test
🤖 Common AI Bot Agents
BotCompany
GPTBotOpenAI
anthropic-aiAnthropic
PerplexityBotPerplexity
CCBotCommon Crawl

📚How to Use robots.txt Tester

  1. 1
    Load your robots.txt

    Enter your website URL to automatically fetch your live robots.txt, or paste the raw robots.txt content directly into the editor. You can edit the content to test proposed changes before deploying.

  2. 2
    Choose user-agent and enter a URL

    Select a user-agent from the list (Googlebot, Bingbot, GPTBot, or any custom bot name) and enter the page URL you want to test — for example /resources/private-post or /admin/. The user-agent selector ensures you test the exact rules relevant to each crawler.

  3. 3
    Review the result

    The tester instantly shows whether the URL is Allowed or Disallowed for the selected bot, and highlights the exact directive that caused the result so you know precisely which rule to edit.

💡Quick Reference

DirectiveEffect
Disallow: /Blocks everything
Allow: /Opens everything
Crawl-delay:Throttles bot speed
Sitemap:Declares XML sitemap

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain-text file placed at the root of your website (e.g. example.com/robots.txt) that instructs search engine crawlers which pages or sections of your site they are allowed or not allowed to access. It is part of the Robots Exclusion Protocol and is respected by all major crawlers including Googlebot, Bingbot and Baidubot.

Does blocking a URL in robots.txt remove it from Google index?

No — robots.txt only controls crawling, not indexing. If other sites link to a blocked page, Google may still index the URL based on those external links. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag header instead.

What is the Disallow: / directive?

Disallow: / blocks the specified user-agent from crawling your entire website. It is the broadest possible restriction. If it applies to Googlebot, your whole site will be de-indexed over time. Always double-check this directive before deploying — it is one of the most common and damaging SEO mistakes.

What is the Crawl-delay directive?

Crawl-delay tells a bot how many seconds to wait between consecutive requests to your server. For example, Crawl-delay: 5 asks the bot to wait 5 seconds between page fetches. Note that Googlebot does not respect this directive — for Google, use the crawl rate settings in Google Search Console instead.

How do I block AI crawlers like GPTBot in robots.txt?

Add a User-agent: GPTBot block followed by Disallow: / to prevent OpenAI GPTBot from crawling your site. Similarly, User-agent: CCBot with Disallow: / blocks Common Crawl. Our tester lets you select these AI bots from the user-agent dropdown and test your rules against them specifically.

Can I have different rules for different bots?

Yes — robots.txt supports multiple User-agent sections, each with their own Allow and Disallow rules. You can allow Googlebot full access while blocking other crawlers from specific directories. The wildcard User-agent: * applies to all crawlers not otherwise specified in the file.

What does the Allow: directive do?

Allow: explicitly permits a crawler to access a URL or path, even within a broader Disallow: section. This is useful for blocking /private/ but allowing /private/sitemap.xml. In cases of conflict, the longer, more specific rule takes precedence.

Where should I link to my XML sitemap in robots.txt?

Add a Sitemap: directive at the end of your robots.txt file, for example: Sitemap: https://example.com/sitemap.xml. You can list multiple sitemaps. This helps search engines discover your sitemap even if they arrive only at your robots.txt file.

Does the tester fetch my actual live robots.txt?

Yes — when you enter a domain URL, the tool attempts to fetch the live robots.txt from example.com/robots.txt via a CORS proxy. You can also paste your robots.txt content directly if you want to test a draft version before going live. All processing happens in your browser.

What is a valid robots.txt file structure?

A valid robots.txt file starts with one or more User-agent: lines specifying which bot the following rules apply to, followed by Disallow: and/or Allow: directives. Each block should be separated by a blank line. Comments start with #. The file must be served at the exact path /robots.txt with a 200 HTTP status and text/plain content-type.