Robots.txt
Robots Exclusion Protocol
A text file at the site root instructing search engine crawlers which pages or directories to avoid.
तकनीकी विवरण
Robots.txt uses two mechanisms: robots.txt (file-level, prevents crawling but not indexing) and meta robots tags (page-level, controls indexing and link following). Common directives: 'noindex' (exclude from search), 'nofollow' (don't pass link equity), 'noarchive' (no cached copy). X-Robots-Tag HTTP headers provide the same controls for non-HTML resources (PDFs, images). A blocked page can still rank if other pages link to it — 'noindex' in meta tags is the only way to guarantee exclusion from search results.
उदाहरण
``` # robots.txt User-agent: * Allow: / Disallow: /admin/ Disallow: /api/internal/ Sitemap: https://peasytools.com/sitemap.xml ```