Eyal Rosenthal · Web scraping at scale

robots.txt Checker: Will This Site Allow Your Scraper?

robots.txt Checker

Paste a robots.txt and the path you want to scrape. Get an instant verdict per user-agent. Runs in your browser — nothing sent anywhere.

How to read the results

  • "allowed (no matching rule)" — the path doesn't match any Disallow: rule for this user-agent. Scrape away.
  • "allowed (specific Allow rule)" — there's a Disallow: that would have blocked you, but a more-specific Allow: overrides it. Scrape away.
  • "blocked by Disallow: /path/" — the user-agent is blocked from this path. Don't scrape (or use a different user-agent string if you have a legitimate reason).
  • Crawl-Delay — minimum seconds between requests the site has requested. Not legally binding but worth respecting.

What robots.txt actually means

robots.txt is a polite-protocol convention from 1994. It's not legally binding in most jurisdictions. The legal weight of ignoring it is contested.

In practice:

  • Search engines respect it strictly
  • Major LLM crawlers (GPTBot, ClaudeBot, PerplexityBot) respect it
  • Web scraping libraries don't enforce it by default
  • Hostile sites may punish ignoring it with auto-bans

The recommendation: respect robots.txt for any crawler you build. It's a clear signal of intent.

Common patterns you'll see

User-agent: *
Disallow: /

Blocks all bots from everything. Rare but absolute. Don't scrape.

User-agent: GPTBot
Disallow: /

Site is opting out of OpenAI's training crawler. Their crawl, their call.

User-agent: *
Disallow: /admin/
Disallow: /search
Crawl-delay: 1

The standard polite robots.txt. Stay out of /admin/ and /search, hit at most once per second elsewhere.

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

Blocks everyone except Google. Aggressive — site owner only wants their data in one engine.

Need this customized for your stack?

Custom calculators, comparison dashboards, scraping ROI models — happy to build them for your team. Send a brief to info@luba.media.

Send a brief