🤖Robots.txt vs Meta Robots: What's the Difference?

Two different tools for controlling crawler access — when to use robots.txt to block crawling versus the meta robots tag to block indexing, and why the distinction matters.

Hugo Team·June 24, 2026
robots.txtmeta robotsnoindexnofollowcrawl controlindexing

Many webmasters confuse robots.txt and the meta robots tag — using one when they need the other. They control different things at different stages: robots.txt controls crawling; meta robots controls indexing.

robots.txt: Crawl Control

robots.txt is a file at your domain root (example.com/robots.txt) that tells crawlers which URLs they are allowed to fetch. It's the first thing Googlebot reads before visiting any page.

text
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml
⚠️robots.txt Does NOT Prevent Indexing

If another site links to a page you've disallowed in robots.txt, Google can still INDEX that URL (it just won't crawl the content). To prevent indexing, use noindex meta tag instead.

Meta Robots Tag: Index Control

The meta robots tag (or X-Robots-Tag HTTP header) controls what crawlers do with a page AFTER they fetch it. It has two main directives:

DirectiveMeaningUse Case
indexAllow indexing (default)All normal pages
noindexDo not add to search indexThank-you pages, login, staging
followFollow links (default)All normal pages
nofollowDo not follow linksUntrusted user content
noarchiveNo cached copy in SERPsPages with sensitive info
nosnippetNo description snippetIf you want full control of snippet
html
<meta name="robots" content="noindex, nofollow">

The Classic Mistake

Never block a page in robots.txt AND set noindex on it. If robots.txt prevents crawling, Googlebot never reads the noindex tag. The correct approach for pages you want to exist but not rank: allow crawling + noindex.

GoalCorrect Approach
Page exists, don't rank itAllow crawl + noindex tag
Page is purely internal, save crawl budgetDisallow in robots.txt (no noindex needed)
Page must rankAllow crawl + no noindex tag
Block all crawlers completelyDisallow: / in robots.txt (note: won't prevent link-based indexing)

References

  1. [1]Google: Introduction to robots.txt — How to control crawler access to your site — developers.google.com
  2. [2]Google: Meta tags that Google understands — noindex, nofollow, and other directives — developers.google.com

Your privacy matters

Hugo stores authentication tokens and your consent record. With your permission we may also show personalised ads via Google AdSense. ·