Eyal Rosenthal · Web scraping at scale

CKAN Open-Data Extractor

CKAN Open-Data Extractor — One Scraper for 200+ Government Portals

CKAN Open-Data Portal Bulk Extractor

Bulk-extract dataset metadata from any CKAN-based open-data portal — data.gov.uk, data.gov.au, data.gov.br, NYC OpenData, Toronto Open Data, the EU Open Data Portal, and 200+ municipal portals globally. All expose the same /api/3/action/package_search endpoint, so one scraper covers the entire ecosystem.

Maps to recurring Upwork brief: "scrape datasets from this government / city open-data portal." Common buyers: civic-tech, journalists, urban-planning analytics, ESG / regulatory research.

Built 2026-05-03 as Demo #37 — fifth hybrid-mode (real-recurring-brief) demo.

Run

. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/ckan_opendata_extractor

python extract.py --portal data.gov.uk --queries "air quality,traffic accidents,covid"
python extract.py --portal data.gov.au --queries "wildfire,electricity"
python extract.py --portal data.cityofnewyork.us --queries "311"  # NYC OpenData

Result (data.gov.uk)

  • 15 datasets extracted across 3 queries (air quality, traffic accidents, covid) ✅
  • Per-row: title, org, license, tags, groups, resource count, formats list, primary download URL, primary file size, last-modified ✅
  • One scraper works against any CKAN portal — only the --portal hostname changes ✅

Why this beats hand-rolling

CKAN portals look different on the web (each city brands their portal) but they all expose the same JSON API under the hood. The naive bid solution scrapes the HTML — fragile + per-site work. The CKAN API is stable across hundreds of portals and ships clean JSON with download URLs.

Hire me to build this for your stack

Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.

info@luba.media