Eyal Rosenthal · Web scraping at scale

GitHub Org / User Bulk Repo Extractor

GitHub Org / User Bulk Repo Extractor — Stars / Lang / License / Topics CSV

GitHub Org / User Bulk Repo Extractor

Bulk extract every public repo from N orgs/users via the GitHub REST API. Maps to recurring Upwork brief: "give me CSV of all repos for these N GitHub orgs — stars, language, last commit, license, archived."

Built 2026-05-03 as Demo #40 — sixth hybrid-mode (real-recurring-brief) demo.

Run

. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/github_org_repos_extractor
export GITHUB_TOKEN=ghp_...    # bumps quota 60→5K req/h
python extract.py --owners anthropics,openai,vercel,supabase
python extract.py --owners gaearon --type user --max-pages 5

Result

  • 628 repos extracted across 4 orgs (anthropics: 79, openai: 200, vercel: 200, supabase: 149) ✅
  • Per-row: owner, name, language, stars, forks, open_issues, watchers, size_kb, license, default_branch, is_fork/archived/disabled/private, topics, created_at, updated_at, pushed_at, description, URL ✅
  • Auto-paginates up to --max-pages (default 3, max 100 repos/page) ✅
  • Use cases: dev-tool sales (target customers), recruitment (team tech profiling), OSS supply-chain, VC scouting ✅

Adapting

  • Filter to non-fork: df = pd.read_csv("repos.csv"); df[~df.is_fork]
  • Stack profile: df.groupby(["owner","language"]).size().unstack()
  • Active-team signal: df.pushed_at > "2025-01-01"

Hire me to build this for your stack

Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.

info@luba.media