The Complete Guide to robots.txt for SEO

ToolsPilot TeamFebruary 3, 20264 min read

The Complete Guide to robots.txt for SEO

The robots.txt file is a simple text file that tells search engine crawlers which pages they can and cannot access on your website. It's one of the most basic yet powerful SEO tools — and one of the most commonly misconfigured.

What Is robots.txt?

The robots.txt file lives at the root of your domain (e.g., https://example.com/robots.txt) and follows the Robots Exclusion Protocol. When a search engine crawler visits your site, it checks this file first before crawling any pages.

Important: robots.txt is a suggestion, not a security measure. Well-behaved crawlers (Google, Bing) follow it, but malicious bots can ignore it entirely. Never use robots.txt to hide sensitive content — use authentication instead.

Basic Syntax

A robots.txt file consists of one or more rule blocks, each starting with a User-agent directive:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public/

Sitemap: https://example.com/sitemap.xml

Directives

Directive Purpose Example
User-agent Specifies which crawler the rules apply to User-agent: Googlebot
Disallow Blocks crawling of a path Disallow: /admin/
Allow Explicitly allows crawling (overrides Disallow) Allow: /admin/public/
Sitemap Points to your XML sitemap Sitemap: https://example.com/sitemap.xml
Crawl-delay Seconds between requests (not supported by Google) Crawl-delay: 10

Common robots.txt Examples

Allow Everything (Default)

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

An empty Disallow means "disallow nothing" — crawlers can access everything.

Block All Crawlers

User-agent: *
Disallow: /

This blocks all crawlers from all pages. Use this for staging or development environments.

Block Specific Directories

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /internal/
Disallow: /tmp/

Sitemap: https://example.com/sitemap.xml

Block Specific Crawlers

# Block AI training bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

# Allow search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Sitemap: https://example.com/sitemap.xml

E-Commerce Site

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /api/
Allow: /

Sitemap: https://example.com/sitemap.xml

WordPress Site

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /author/
Disallow: /?s=
Disallow: /search/

Sitemap: https://example.com/sitemap.xml

Pattern Matching

Google and Bing support wildcard patterns:

Asterisk (*) — Matches Any Sequence

# Block all PDF files
Disallow: /*.pdf$

# Block all URLs with query parameters
Disallow: /*?

# Block URLs containing "print"
Disallow: /*print*

Dollar Sign ($) — End of URL

# Block exactly /about but not /about/team
Disallow: /about$

robots.txt vs Meta Robots vs X-Robots-Tag

Method Level Purpose
robots.txt Site-wide Block crawling of paths
Meta robots Per page Control indexing and following
X-Robots-Tag HTTP header Control indexing for non-HTML files

Key difference: robots.txt prevents crawling, while meta robots controls indexing. A page blocked by robots.txt might still appear in search results (as a URL without a snippet) if other pages link to it.

To fully remove a page from search results, use:

<meta name="robots" content="noindex, nofollow" />

Learn more about meta tags in our Meta Tags SEO Guide.

Common Mistakes

1. Blocking CSS and JavaScript

# DON'T do this
Disallow: /css/
Disallow: /js/

Google needs to render your pages to understand them. Blocking CSS and JS prevents proper rendering and hurts your rankings.

2. Blocking Your Entire Site Accidentally

# This blocks EVERYTHING
User-agent: *
Disallow: /

This is correct for staging environments but catastrophic for production. Always double-check before deploying.

3. Using robots.txt for Security

robots.txt is publicly accessible — anyone can read it at yoursite.com/robots.txt. Listing paths you want to hide actually tells attackers where to look.

4. Forgetting the Sitemap Directive

Always include your sitemap URL:

Sitemap: https://example.com/sitemap.xml

This helps crawlers discover all your pages, especially new or deeply nested ones.

5. Blocking Pages You Want Indexed

If you Disallow a page, search engines won't crawl it — which means they can't read the content. Make sure you're not accidentally blocking important pages.

Testing Your robots.txt

Google Search Console

Google provides a robots.txt tester in Search Console that lets you check if specific URLs are blocked or allowed by your rules.

Manual Testing

You can test your robots.txt by visiting https://yoursite.com/robots.txt directly. Make sure:

  • The file is accessible (returns HTTP 200)
  • The syntax is correct
  • Your important pages aren't blocked

Generate Your robots.txt

Writing robots.txt manually is straightforward for simple sites, but larger sites with many directories benefit from a structured approach. Use our Robots.txt Generator to create a properly formatted file with the right directives for your site type.

Our generator supports:

  • Common presets (WordPress, e-commerce, SPA)
  • Custom Allow/Disallow rules
  • Multiple User-agent blocks
  • AI bot blocking options
  • Automatic Sitemap directive

Best Practices

  1. Keep it simple — only block what truly shouldn't be crawled
  2. Always include a Sitemap directive — help crawlers find your content
  3. Don't block CSS or JavaScript — search engines need them for rendering
  4. Test after every change — use Google Search Console's tester
  5. Review periodically — as your site grows, update your rules
  6. Use specific pathsDisallow: /admin/ is better than broad patterns
  7. Consider AI crawlers — decide if you want to allow or block GPTBot, CCBot, etc.

Wrapping Up

A well-configured robots.txt file helps search engines crawl your site efficiently, focusing their budget on your most important pages. Keep it simple, test your rules, and review it regularly.

Generate your robots.txt file quickly with our free Robots.txt Generator, and check your other SEO meta tags with the Meta Tag Generator.

Share this article

Related Articles