Eyal Rosenthal · Web scraping at scale

Resources.

The curated index. Everything on this site, organized by what you're trying to do.

If you're new to web scraping

  1. Getting Started with Web Scraping

    Zero to first working scraper in 30 minutes. Read this first.

  2. Web Scraping Glossary

    Every term defined plainly.

  3. FAQ

    The 25 most-asked questions.

  4. Legal & Ethics

    What's OK to scrape, what isn't, what the gray zones look like.

Picking your stack

  1. Tools Comparison

    Scrapy vs Playwright vs Beautiful Soup vs ScrapingBee vs DIY.

  2. Scrapy vs Playwright vs Selenium

    Decision tree for which framework wins.

  3. Best Residential Proxies 2026

    Webshare vs Bright Data vs Oxylabs vs IPRoyal.

  4. Cost calculator

    DIY vs Apify vs ScrapingBee vs Bright Data, with your numbers.

How to scrape specific sites

  1. Amazon

    The hardest mainstream e-commerce target.

  2. Google Search

    And the two cheap alternatives that solve 90% of use cases.

  3. Twitter / X

    Honest: don't, use the API.

  4. LinkedIn

    Honest: you can't safely. Here are the four legitimate alternatives.

  5. Reddit

    The official API is the right answer.

  6. Yelp

    Their Fusion API + selective DIY.

  7. Wikipedia

    The easy target everyone overcomplicates.

  8. YouTube

    Videos, transcripts, channel data — three different toolchains.

  9. Indeed

    Anti-bot heavy. The managed services earn their cost.

  10. eBay

    The friendliest major e-commerce target.

Going deeper

  1. Self-Healing AI Extractors

    The 2026 schema-driven scraping pattern.

  2. Anti-bot Bypass 2026

    Cloudflare, DataDome, PerimeterX. The full playbook.

  3. $5/mo VPS vs $1,200/mo ScrapingBee

    The pipeline-as-product math.

  4. 100 Production Scrapers, One Repo

    Six patterns that cover almost every brief.

  5. SEC EDGAR + XBRL

    From filings to clean CSV in 30 seconds.

Free tools (browser-side, no signup)

  1. Cost Calculator

    DIY vs Apify vs ScrapingBee vs Bright Data — for your volume.

  2. robots.txt Checker

    Will this site allow your scraper? Per user-agent verdict.

Production demos in the public repo

Every demo on this site is a runnable Python project. Browse the 46 demos or jump straight to the source on GitHub.

For machine readers