Eyal Rosenthal · Web scraping at scale

Government Facility Monitor

Government Facility Monitor — Schema-Aware Wikitable Scraper with Diff Alerts

Government Facility / Open-Data Monitor

Drop-in monitor for any government / municipal / open-data wikitable listing. Walks the listing, extracts structured records (name, location, attributes), diffs against last snapshot, alerts on change.

Built 2026-05-03 to address the pattern in Upwork brief ~022050752"Data Pipeline for Municipal Facility Data". Public-target demo (Wikipedia "List of national parks of the United States") because real municipal portals usually need client context. The pattern generalizes to: city library directories, hospital lists, post offices, court records, public-school directories — anywhere a government body publishes a structured table.

Run

cd ~/freelance/portfolio_demos/gov_facility_monitor
. ~/freelance/.venv/bin/activate
python monitor.py --reset
python monitor.py --once                # baseline
python monitor.py --simulate-changes 5  # demo: tweak state
python monitor.py --once                # see diff fire

Result (last verified)

  • 63 national parks extracted on baseline ✅
  • Re-run identical: 0 changes (idempotent) ✅
  • Simulated 5 area-acres changes → all 5 detected, change report rendered, CSV emitted ✅
  • Composes scaffold + tenacity retry + Slack/email/print-mode

Why this beats hand-coded scrapers

  • Tenacity retry: transient HTTP failures self-recover, no silent skips
  • Idempotent: safe to re-run; state per source means no dedup drift
  • Schema-aware: header-cell parsing means it works on any wikitable, not just one
  • Cron-friendly: single-shot exit, audit log, no daemons
  • Generalizable: change one function (extract_items) and the same scaffold handles a Shopify store, a real-estate portal, or a Companies House search result

Files

  • monitor.py — main script (built from _template/scaffold.py)
  • config.json — source URLs + comments
  • state/ — per-source JSON snapshots (gitignored)
  • reports/ — CSV reports per run (gitignored)
  • monitor.log — audit log

Hire me to build this for your stack

Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.

info@luba.media