The robots.txt file is a simple text file placed in your website's root directory that tells search engine crawlers which pages they can and cannot access. While it doesn't enforce access restrictions (crawlers can ignore it), major search engines respect robots.txt directives, making it an important tool for controlling how your site is crawled and indexed.
Our Robots.txt Generator helps you create properly formatted robots.txt files without needing to memorize the syntax. Simply configure your rules using the visual interface, and the tool generates valid robots.txt content ready for download. All processing happens in your browser—your configuration is never sent to any server.
Understanding Robots.txt Basics
A robots.txt file consists of one or more "records," each specifying rules for a particular user agent (crawler). Each record starts with a User-agent line identifying which crawler the rules apply to, followed by Allow and Disallow directives specifying which paths the crawler can or cannot access.
User-agent: Specifies which crawler the following rules apply to. Use * to apply rules to all crawlers, or specify individual bots like Googlebot or Bingbot.
Disallow: Tells crawlers not to access the specified path. Disallow: /admin/ prevents crawling of anything under the /admin/ directory.
Allow: Explicitly permits access to a path, useful when you want to allow a specific page within a disallowed directory.
Sitemap: Points crawlers to your XML sitemap, helping them discover all your important pages. This should be a full URL.
Common Robots.txt Patterns
Allow all (default): If you want search engines to crawl your entire site, you can use an empty Disallow directive or simply not have a robots.txt file. However, including a sitemap reference is still beneficial.
Block all: To prevent all crawling (common on staging sites), use Disallow: /. Be extremely careful with this on production sites as it will remove your pages from search results.
Block specific directories: Commonly blocked paths include /admin/, /private/, /tmp/, and /cgi-bin/. These contain content not meant for public indexing.
Block query parameters: Disallow: /*?* prevents crawling of URLs with query strings, which can help avoid duplicate content issues.
Crawl-delay Directive
The Crawl-delay directive tells crawlers to wait a specified number of seconds between requests. This can prevent server overload from aggressive crawling. Note that Googlebot doesn't support Crawl-delay—use Google Search Console to control Googlebot's crawl rate instead.
Important Limitations
Robots.txt is advisory, not enforceable. Malicious bots and scrapers typically ignore it entirely. For true access control, use authentication, IP blocking, or server-side restrictions. Robots.txt is designed for legitimate search engine cooperation, not security.
Pages blocked by robots.txt can still appear in search results if they're linked from other sites. The listing will show limited information since the crawler couldn't access the page content. For complete removal, use the noindex meta tag or Search Console's removal tools.
Testing Your Robots.txt
After deploying your robots.txt file, test it using Google Search Console's robots.txt Tester. This shows how Google interprets your rules and whether specific URLs are blocked. Mistakes in robots.txt can accidentally hide important pages from search engines.
Common Mistakes to Avoid
- Accidentally blocking your entire site with Disallow: /
- Blocking CSS and JavaScript files that search engines need to render pages
- Using robots.txt to hide sensitive content (it's publicly visible)
- Forgetting to update robots.txt after site restructuring
- Using relative URLs for sitemap references
Common Use Cases
Block Admin Areas
Prevent search engines from crawling administrative pages, login areas, and internal tools that shouldn't appear in search results.
Staging Site Protection
Block all crawling on staging or development sites to prevent test content from appearing in search results.
Manage Crawl Budget
Direct crawlers away from low-value pages so they focus crawl budget on your most important content.
Prevent Duplicate Content
Block parameter-based URLs, print pages, or other duplicate content that might dilute your SEO.
Sitemap Declaration
Point search engines to your XML sitemap to ensure they discover all important pages.
Bot-Specific Rules
Create different rules for different crawlers, such as allowing Googlebot but blocking aggressive scrapers.
Worked Examples
Standard Production Site
Input
Allow all, block /admin/, include sitemap
Output
User-agent: * Allow: / Disallow: /admin/ Sitemap: https://example.com/sitemap.xml
This common configuration allows full site crawling while blocking administrative pages and declaring the sitemap location.
Staging Site Block
Input
Block all robots completely
Output
User-agent: * Disallow: /
This blocks all crawlers from accessing any page on the site. Use only for non-production environments.
Frequently Asked Questions
Where should I place the robots.txt file?
The robots.txt file must be placed in your website's root directory and accessible at example.com/robots.txt. It won't work in subdirectories or subdomains—each subdomain needs its own robots.txt.
Does robots.txt provide security?
No. Robots.txt is publicly readable and only provides guidance to well-behaved crawlers. Malicious bots ignore it. Never use robots.txt to hide sensitive information—use proper authentication and access controls instead.
Will blocking a page remove it from Google?
Not necessarily. If other sites link to a blocked page, it may still appear in search results with limited information. For complete removal, use the noindex meta tag or Google Search Console's URL removal tool.
How often do search engines check robots.txt?
Major search engines typically cache robots.txt for up to 24 hours. After making changes, it may take up to a day for crawlers to see the new rules. You can request a fresh fetch in Google Search Console.
Should I block CSS and JavaScript files?
Generally no. Modern search engines need to render pages to understand content. Blocking CSS/JS can prevent proper rendering and hurt your SEO. Only block these if there's a specific technical reason.
Does this tool validate my existing robots.txt?
This tool generates new robots.txt content based on your configuration. For testing existing files against specific URLs, use Google Search Console's robots.txt Tester which shows how Googlebot interprets your rules.
