🤖Robots.txt vs Meta Robots: What's the Difference?
Two different tools for controlling crawler access — when to use robots.txt to block crawling versus the meta robots tag to block indexing, and why the distinction matters.
Many webmasters confuse robots.txt and the meta robots tag — using one when they need the other. They control different things at different stages: robots.txt controls crawling; meta robots controls indexing.
robots.txt: Crawl Control
robots.txt is a file at your domain root (example.com/robots.txt) that tells crawlers which URLs they are allowed to fetch. It's the first thing Googlebot reads before visiting any page.
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
Sitemap: https://example.com/sitemap.xmlIf another site links to a page you've disallowed in robots.txt, Google can still INDEX that URL (it just won't crawl the content). To prevent indexing, use noindex meta tag instead.
Meta Robots Tag: Index Control
The meta robots tag (or X-Robots-Tag HTTP header) controls what crawlers do with a page AFTER they fetch it. It has two main directives:
| Directive | Meaning | Use Case |
|---|---|---|
| index | Allow indexing (default) | All normal pages |
| noindex | Do not add to search index | Thank-you pages, login, staging |
| follow | Follow links (default) | All normal pages |
| nofollow | Do not follow links | Untrusted user content |
| noarchive | No cached copy in SERPs | Pages with sensitive info |
| nosnippet | No description snippet | If you want full control of snippet |
<meta name="robots" content="noindex, nofollow">The Classic Mistake
Never block a page in robots.txt AND set noindex on it. If robots.txt prevents crawling, Googlebot never reads the noindex tag. The correct approach for pages you want to exist but not rank: allow crawling + noindex.
| Goal | Correct Approach |
|---|---|
| Page exists, don't rank it | Allow crawl + noindex tag |
| Page is purely internal, save crawl budget | Disallow in robots.txt (no noindex needed) |
| Page must rank | Allow crawl + no noindex tag |
| Block all crawlers completely | Disallow: / in robots.txt (note: won't prevent link-based indexing) |
References
- [1]Google: Introduction to robots.txt — How to control crawler access to your site — developers.google.com
- [2]Google: Meta tags that Google understands — noindex, nofollow, and other directives — developers.google.com