robots.txt: Publish Clear Crawl Rules at the Site Root

Robots.txt is one of the first files crawlers look for when they evaluate a site.

It should be present, intentional, and free of launch-blocking mistakes.

What It Is

The robots.txt file lives at the site root and gives crawl guidance to well-behaved bots.

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

Why It Matters

  • It communicates basic crawl rules early.
  • It can point crawlers to the sitemap.
  • It helps avoid accidental blocking of important public paths.

Best Practices

  1. Publish the file at /robots.txt.
  2. Keep rules simple unless you have a clear reason for complexity.
  3. Review staging disallow rules before deploying to production.

Common Mistakes

  • No file at all.
  • Leaving Disallow: / from staging.
  • Using robots.txt as if it were a security control.

Quick Checklist

  • File exists at root.
  • Important pages are not blocked.
  • Sitemap location included when useful.

Final Takeaway

Robots.txt should guide discovery, not accidentally suppress it.

Run this check on your own page

Open the tool and analyze a public URL to see this section inside the full report.

Back to checker

Continue to your tool account

Use Google or email. New tool accounts are created automatically the first time you continue.

We'll email you a 6-digit one-time code. Entering it on the next screen signs you in and creates your tool account automatically if needed.