The Complete Guide to robots.txt for SEO

The robots.txt file is a simple text file that tells search engine crawlers which pages they can and cannot access on your website. It's one of the most basic yet powerful SEO tools — and one of the most commonly misconfigured.

What Is robots.txt?

The robots.txt file lives at the root of your domain (e.g., https://example.com/robots.txt) and follows the Robots Exclusion Protocol. When a search engine crawler visits your site, it checks this file first before crawling any pages.

Important: robots.txt is a suggestion, not a security measure. Well-behaved crawlers (Google, Bing) follow it, but malicious bots can ignore it entirely. Never use robots.txt to hide sensitive content — use authentication instead.

Basic Syntax

A robots.txt file consists of one or more rule blocks, each starting with a User-agent directive:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public/

Sitemap: https://example.com/sitemap.xml

Directives

Directive	Purpose	Example
`User-agent`	Specifies which crawler the rules apply to	`User-agent: Googlebot`
`Disallow`	Blocks crawling of a path	`Disallow: /admin/`
`Allow`	Explicitly allows crawling (overrides Disallow)	`Allow: /admin/public/`
`Sitemap`	Points to your XML sitemap	`Sitemap: https://example.com/sitemap.xml`
`Crawl-delay`	Seconds between requests (not supported by Google)	`Crawl-delay: 10`

Common robots.txt Examples

Allow Everything (Default)

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

An empty Disallow means "disallow nothing" — crawlers can access everything.

Block All Crawlers

User-agent: *
Disallow: /

This blocks all crawlers from all pages. Use this for staging or development environments.

Block Specific Directories

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /internal/
Disallow: /tmp/

Sitemap: https://example.com/sitemap.xml

Block Specific Crawlers

# Block AI training bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

# Allow search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Sitemap: https://example.com/sitemap.xml

E-Commerce Site

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /api/
Allow: /

Sitemap: https://example.com/sitemap.xml

WordPress Site

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /author/
Disallow: /?s=
Disallow: /search/

Sitemap: https://example.com/sitemap.xml

Pattern Matching

Google and Bing support wildcard patterns:

Asterisk (*) — Matches Any Sequence

# Block all PDF files
Disallow: /*.pdf$

# Block all URLs with query parameters
Disallow: /*?

# Block URLs containing "print"
Disallow: /*print*

Dollar Sign ($) — End of URL

# Block exactly /about but not /about/team
Disallow: /about$

robots.txt vs Meta Robots vs X-Robots-Tag

Method	Level	Purpose
robots.txt	Site-wide	Block crawling of paths
Meta robots	Per page	Control indexing and following
X-Robots-Tag	HTTP header	Control indexing for non-HTML files

Key difference: robots.txt prevents crawling, while meta robots controls indexing. A page blocked by robots.txt might still appear in search results (as a URL without a snippet) if other pages link to it.

To fully remove a page from search results, use:

<meta name="robots" content="noindex, nofollow" />

Learn more about meta tags in our Meta Tags SEO Guide.

Common Mistakes

1. Blocking CSS and JavaScript

# DON'T do this
Disallow: /css/
Disallow: /js/

Google needs to render your pages to understand them. Blocking CSS and JS prevents proper rendering and hurts your rankings.

2. Blocking Your Entire Site Accidentally

# This blocks EVERYTHING
User-agent: *
Disallow: /

This is correct for staging environments but catastrophic for production. Always double-check before deploying.

3. Using robots.txt for Security

robots.txt is publicly accessible — anyone can read it at yoursite.com/robots.txt. Listing paths you want to hide actually tells attackers where to look.

4. Forgetting the Sitemap Directive

Always include your sitemap URL:

Sitemap: https://example.com/sitemap.xml

This helps crawlers discover all your pages, especially new or deeply nested ones.

5. Blocking Pages You Want Indexed

If you Disallow a page, search engines won't crawl it — which means they can't read the content. Make sure you're not accidentally blocking important pages.

Testing Your robots.txt

Google Search Console

Google provides a robots.txt tester in Search Console that lets you check if specific URLs are blocked or allowed by your rules.

Manual Testing

You can test your robots.txt by visiting https://yoursite.com/robots.txt directly. Make sure:

The file is accessible (returns HTTP 200)
The syntax is correct
Your important pages aren't blocked

Generate Your robots.txt

Writing robots.txt manually is straightforward for simple sites, but larger sites with many directories benefit from a structured approach. Use our Robots.txt Generator to create a properly formatted file with the right directives for your site type.

Our generator supports:

Common presets (WordPress, e-commerce, SPA)
Custom Allow/Disallow rules
Multiple User-agent blocks
AI bot blocking options
Automatic Sitemap directive

Best Practices

Keep it simple — only block what truly shouldn't be crawled
Always include a Sitemap directive — help crawlers find your content
Don't block CSS or JavaScript — search engines need them for rendering
Test after every change — use Google Search Console's tester
Review periodically — as your site grows, update your rules
Use specific paths — Disallow: /admin/ is better than broad patterns
Consider AI crawlers — decide if you want to allow or block GPTBot, CCBot, etc.

Wrapping Up

A well-configured robots.txt file helps search engines crawl your site efficiently, focusing their budget on your most important pages. Keep it simple, test your rules, and review it regularly.

Generate your robots.txt file quickly with our free Robots.txt Generator, and check your other SEO meta tags with the Meta Tag Generator.

The Complete Guide to robots.txt for SEO

The Complete Guide to robots.txt for SEO

What Is robots.txt?

Basic Syntax

Directives

Common robots.txt Examples

Allow Everything (Default)

Block All Crawlers

Block Specific Directories

Block Specific Crawlers

E-Commerce Site

WordPress Site

Pattern Matching

Asterisk (*) — Matches Any Sequence

Dollar Sign ($) — End of URL

robots.txt vs Meta Robots vs X-Robots-Tag

Common Mistakes

1. Blocking CSS and JavaScript

2. Blocking Your Entire Site Accidentally

3. Using robots.txt for Security

4. Forgetting the Sitemap Directive

5. Blocking Pages You Want Indexed

Testing Your robots.txt

Google Search Console

Manual Testing

Generate Your robots.txt

Best Practices

Wrapping Up

Related Articles