🤖Robots.txt vs Meta Robots: What's the Difference?

Two different tools for controlling crawler access — when to use robots.txt to block crawling versus the meta robots tag to block indexing, and why the distinction matters.

Hugo Team·June 24, 2026

robots.txtmeta robotsnoindexnofollowcrawl controlindexing

Many webmasters confuse robots.txt and the meta robots tag — using one when they need the other. They control different things at different stages: robots.txt controls crawling; meta robots controls indexing.

robots.txt: Crawl Control

robots.txt is a file at your domain root (example.com/robots.txt) that tells crawlers which URLs they are allowed to fetch. It's the first thing Googlebot reads before visiting any page.

text

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

⚠️robots.txt Does NOT Prevent Indexing

If another site links to a page you've disallowed in robots.txt, Google can still INDEX that URL (it just won't crawl the content). To prevent indexing, use noindex meta tag instead.

Meta Robots Tag: Index Control

The meta robots tag (or X-Robots-Tag HTTP header) controls what crawlers do with a page AFTER they fetch it. It has two main directives:

Directive	Meaning	Use Case
index	Allow indexing (default)	All normal pages
noindex	Do not add to search index	Thank-you pages, login, staging
follow	Follow links (default)	All normal pages
nofollow	Do not follow links	Untrusted user content
noarchive	No cached copy in SERPs	Pages with sensitive info
nosnippet	No description snippet	If you want full control of snippet

html

<meta name="robots" content="noindex, nofollow">

The Classic Mistake

Never block a page in robots.txt AND set noindex on it. If robots.txt prevents crawling, Googlebot never reads the noindex tag. The correct approach for pages you want to exist but not rank: allow crawling + noindex.

Goal	Correct Approach
Page exists, don't rank it	Allow crawl + noindex tag
Page is purely internal, save crawl budget	Disallow in robots.txt (no noindex needed)
Page must rank	Allow crawl + no noindex tag
Block all crawlers completely	Disallow: / in robots.txt (note: won't prevent link-based indexing)

References

[1]Google: Introduction to robots.txt — How to control crawler access to your site — developers.google.com
[2]Google: Meta tags that Google understands — noindex, nofollow, and other directives — developers.google.com