🕷️How the Crawler Works
Understand how Hugo discovers and analyzes your website pages through sitemap parsing, robots.txt reading, and intelligent URL discovery.
When you submit a URL, Hugo doesn't just analyze that single page — it discovers and scans related subpages too. This gives you a holistic view of your site's SEO health. Here's exactly how the discovery process works.
URL Discovery Strategy
Hugo uses a multi-step strategy to find pages on your site:
- Parse your robots.txt file for Sitemap: directives.[1]
- Try standard sitemap locations: /sitemap.xml and /sitemap_index.xml.[2]
- If a sitemap index is found, parse up to 10 sub-sitemaps.
- Extract up to 20 subpage URLs from the sitemap entries.
- Fallback: If no sitemap exists, extract internal links from the homepage HTML.
Analysis Depth
The main page (the URL you enter) receives a deep analysis with all check categories including Performance. Subpages receive a lighter analysis — they skip performance checks and premium modules, focusing on metadata, content, technical, links, structured data, and accessibility.
Subpages are analyzed concurrently in batches of 5 for faster results. The real-time dashboard shows progress as each page completes.
Robots.txt Compliance
Hugo checks your robots.txt to ensure your site is accessible to crawlers. If robots.txt contains a Disallow: / directive, Hugo will flag this as a warning since it blocks all crawlers from indexing your site.[1]
Security & Safety
Hugo includes built-in SSRF (Server-Side Request Forgery) protection.[3] It blocks requests to private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), localhost, and Docker internal addresses.[4] Only public URLs on ports 80 and 443 are analyzed.
References
- [1]Google Search Central — Introduction to robots.txt — developers.google.com
- [2]Sitemaps.org — XML Sitemap Protocol — sitemaps.org
- [3]OWASP — Server-Side Request Forgery Prevention Cheat Sheet — cheatsheetseries.owasp.org
- [4]IETF RFC 1918 — Address Allocation for Private Internets — rfc-editor.org