Y Combinator Companies Bulk Extractor
YC Companies Directory Bulk Extractor
Pull every Y Combinator-funded company via the public api.ycombinator.com/v0.1/companies API. Maps to a recurring real Upwork brief class: VCs, sales-intel teams, recruitment agencies, and sourcing tools post jobs every month asking for "scrape every YC company in batch X" or "every Active YC company with team_size > 10."
Built 2026-05-03 as Demo #28 — second hybrid-mode demo (real recurring brief class, not a pure pattern).
Run
. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/yc_companies_extractor
python extract.py --reset --batch P26 --max-pages 3 # quick smoke test
python extract.py --batch P26 # latest batch only
python extract.py --batch P26 --status Active # active P26 companies
python extract.py # full directory (~5000 companies)
Result (P26 batch slice)
- 75 P26 companies extracted in 3 pages ✅
- Per-row fields: id, name, website, yc_url, batch, status, team_size, one-liner, industries, tags, regions, locations, badges ✅
- Resume + dedupe via in-CSV id set ✅
- 0.4s politeness sleep between pages ✅
- Tenacity retries with exponential backoff ✅
Why this beats hand-rolling the same scraper
The naive bid solution is HTML-scraping ycombinator.com/companies — fragile, JS-heavy. The API has been stable for years and ships clean structured data including filterable fields the HTML doesn't surface (long descriptions, badges, region tags).
The brief-class fit covers:
- Sales intel: filter to
Active+ specific industry → outbound list - Recruitment: filter to
team_size > Nand a region → talent pipeline - Investment scouting: latest batch + specific tag (e.g., AI) → deal-flow source
- Competitive intel: track when a competitor's portfolio company changes status
Adapting to a related brief
Same pattern works on Crunchbase Pro (paid), Dealroom, AngelList Talent, Indie Hackers — drop in the new endpoint + projection.
Hire me to build this for your stack
Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.
info@luba.media