🕷️How the Crawler Works

Understand how Hugo discovers and analyzes your website pages through sitemap parsing, robots.txt reading, and intelligent URL discovery.

Hugo Team·February 26, 2026

crawlersitemaprobots.txtdiscoverysubpages

When you submit a URL, Hugo doesn't just analyze that single page — it discovers and scans related subpages too. This gives you a holistic view of your site's SEO health. Here's exactly how the discovery process works.

URL Discovery Strategy

URL Discovery Pipeline

🤖robots.txtSitemap directives

🗺️sitemap.xmlStandard locations

📂Sub-sitemapsUp to 10

🔗Extract URLsUp to 20 pages

🔄FallbackParse <a> links

Hugo uses a multi-step strategy to find pages on your site:

Parse your robots.txt file for Sitemap: directives.[1]
Try standard sitemap locations: /sitemap.xml and /sitemap_index.xml.[2]
If a sitemap index is found, parse up to 10 sub-sitemaps.
Extract up to 20 subpage URLs from the sitemap entries.
Fallback: If no sitemap exists, extract internal links from the homepage HTML.

Analysis Depth

The main page (the URL you enter) receives a deep analysis with all check categories including Performance. Subpages receive a lighter analysis — they skip performance checks, focusing on metadata, content, technical, links, structured data, and accessibility.

ℹ️Concurrency

Subpages are analyzed concurrently in batches of 5 for faster results. The real-time dashboard shows progress as each page completes.

Robots.txt Compliance

Hugo checks your robots.txt to ensure your site is accessible to crawlers. If robots.txt contains a Disallow: / directive, Hugo will flag this as a warning since it blocks all crawlers from indexing your site.[1]

Security & Safety

Hugo includes built-in SSRF (Server-Side Request Forgery) protection.[3] It blocks requests to private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), localhost, and Docker internal addresses.[4] Only public URLs on ports 80 and 443 are analyzed.

References

[1]Google Search Central — Introduction to robots.txt — developers.google.com
[2]Sitemaps.org — XML Sitemap Protocol — sitemaps.org
[3]OWASP — Server-Side Request Forgery Prevention Cheat Sheet — cheatsheetseries.owasp.org
[4]IETF RFC 1918 — Address Allocation for Private Internets — rfc-editor.org