Robots.txt Generator
Generate a robots.txt file to control search engine crawling.
User-agent: * Allow: / Disallow: /admin/ Disallow: /private/ Disallow: /api/ Sitemap: https://example.com/sitemap.xml
What is robots.txt?
The robots.txt file is a plain text file placed at the root of a website (e.g. https://example.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to access. It is part of the Robots Exclusion Protocol, established in 1994. Every major search engine β Google, Bing, Yandex, and others β respects robots.txt by convention, though they are not technically required to.
Robots.txt syntax and directives
User-agent: * β applies to all bots (* = wildcard)
Disallow: /admin/ β block this path for all bots
Allow: / β allow root (overrides broader disallow)
Crawl-delay: 10 β wait 10 seconds between requests
Sitemap: https://example.com/sitemap.xml β tell bots where sitemap is
User-agent: Googlebot β rule just for Google's crawler
Disallow: /private/ β block Google from /private/
User-agent: GPTBot β OpenAI's training crawler
Disallow: / β block all content from AI trainingCommon paths to block
Critical mistakes to avoid
- Never block CSS and JavaScript. Blocking /wp-content/ or /assets/ prevents Google from rendering your pages correctly, often causing ranking drops.
- Robots.txt does not prevent indexing β it prevents crawling. A page blocked in robots.txt can still be indexed if other sites link to it. Use <meta name="robots" content="noindex"> to prevent indexing.
- robots.txt is publicly visible. It reveals your site structure to anyone who looks. Do not use it as a security measure β protected pages need proper authentication.
- Test before deploying. An accidental
Disallow: /blocks all crawlers from your entire site. Use Google Search Console's robots.txt tester before uploading changes.
Frequently asked questions
What is a robots.txt file?
robots.txt is a plain-text file at the root of your site that tells search-engine crawlers which paths they may or may not request. It is the first file most crawlers check.
Does robots.txt keep a page out of Google?
Not reliably. Disallow stops crawling, but a blocked URL can still be indexed if other sites link to it. To keep a page out of search results, use a noindex meta tag instead.
What does "Disallow: /" mean?
It tells the specified user-agent not to crawl any URL on the site. An empty Disallow (or no Disallow) allows everything.
Should I link my sitemap in robots.txt?
Yes. Adding a Sitemap: line with the full URL of your XML sitemap helps crawlers discover all your pages efficiently.
robots.txt tells search engines which pages to crawl and which to skip. User-agent: * means all bots. Disallow: /admin/ blocks that path. Allow: / allows access. Sitemap: points to your sitemap URL.