Resources.
The curated index. Everything on this site, organized by what you're trying to do.
If you're new to web scraping
- Getting Started with Web Scraping
Zero to first working scraper in 30 minutes. Read this first.
- Web Scraping Glossary
Every term defined plainly.
- FAQ
The 25 most-asked questions.
- Legal & Ethics
What's OK to scrape, what isn't, what the gray zones look like.
Picking your stack
- Tools Comparison
Scrapy vs Playwright vs Beautiful Soup vs ScrapingBee vs DIY.
- Scrapy vs Playwright vs Selenium
Decision tree for which framework wins.
- Best Residential Proxies 2026
Webshare vs Bright Data vs Oxylabs vs IPRoyal.
- Cost calculator
DIY vs Apify vs ScrapingBee vs Bright Data, with your numbers.
How to scrape specific sites
- Amazon
The hardest mainstream e-commerce target.
- Google Search
And the two cheap alternatives that solve 90% of use cases.
- Twitter / X
Honest: don't, use the API.
- LinkedIn
Honest: you can't safely. Here are the four legitimate alternatives.
- Reddit
The official API is the right answer.
- Yelp
Their Fusion API + selective DIY.
- Wikipedia
The easy target everyone overcomplicates.
- YouTube
Videos, transcripts, channel data — three different toolchains.
- Indeed
Anti-bot heavy. The managed services earn their cost.
- eBay
The friendliest major e-commerce target.
Going deeper
- Self-Healing AI Extractors
The 2026 schema-driven scraping pattern.
- Anti-bot Bypass 2026
Cloudflare, DataDome, PerimeterX. The full playbook.
- $5/mo VPS vs $1,200/mo ScrapingBee
The pipeline-as-product math.
- 100 Production Scrapers, One Repo
Six patterns that cover almost every brief.
- SEC EDGAR + XBRL
From filings to clean CSV in 30 seconds.
Free tools (browser-side, no signup)
- Cost Calculator
DIY vs Apify vs ScrapingBee vs Bright Data — for your volume.
- robots.txt Checker
Will this site allow your scraper? Per user-agent verdict.
Production demos in the public repo
Every demo on this site is a runnable Python project. Browse the 46 demos or jump straight to the source on GitHub.
For machine readers
- llms.txt — citation-friendly summary for AI search engines
- sitemap.xml — every page on this site
- rss.xml — subscribe to new tutorials
- robots.txt — crawler permissions (yes, all major bots allowed)