CKAN Open-Data Extractor
CKAN Open-Data Portal Bulk Extractor
Bulk-extract dataset metadata from any CKAN-based open-data portal — data.gov.uk, data.gov.au, data.gov.br, NYC OpenData, Toronto Open Data, the EU Open Data Portal, and 200+ municipal portals globally. All expose the same /api/3/action/package_search endpoint, so one scraper covers the entire ecosystem.
Maps to recurring Upwork brief: "scrape datasets from this government / city open-data portal." Common buyers: civic-tech, journalists, urban-planning analytics, ESG / regulatory research.
Built 2026-05-03 as Demo #37 — fifth hybrid-mode (real-recurring-brief) demo.
Run
. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/ckan_opendata_extractor
python extract.py --portal data.gov.uk --queries "air quality,traffic accidents,covid"
python extract.py --portal data.gov.au --queries "wildfire,electricity"
python extract.py --portal data.cityofnewyork.us --queries "311" # NYC OpenData
Result (data.gov.uk)
- 15 datasets extracted across 3 queries (air quality, traffic accidents, covid) ✅
- Per-row: title, org, license, tags, groups, resource count, formats list, primary download URL, primary file size, last-modified ✅
- One scraper works against any CKAN portal — only the
--portalhostname changes ✅
Why this beats hand-rolling
CKAN portals look different on the web (each city brands their portal) but they all expose the same JSON API under the hood. The naive bid solution scrapes the HTML — fragile + per-site work. The CKAN API is stable across hundreds of portals and ships clean JSON with download URLs.
Hire me to build this for your stack
Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.
info@luba.media