46 production scraping & pipeline demos.
All real, all runnable. Each one ships with a working extractor, a sample output, and the operational discipline that makes it production-grade.
Demo · #01
Self-Healing AI Web Extractor
A web extractor that does not break when sites redesign. Pages are converted to text and passed to an LLM with a strict JSON schema; the schema (not the markup)
Demo · #02
Real-Time Competitor Price Watch
Catalog-monitoring pipeline that snapshots a competitor's product list on a schedule, diffs against the last run, and posts a structured Slack alert the moment
Demo · #03
GitHub Trending Monitor
Daily monitor across GitHub's trending pages (Python / TypeScript / General). Alerts on new repos entering the trending list, star-count deltas, and language dr
Demo · #04
Government Facility Monitor
Drop-in monitor for any government / municipal / open-data wikitable listing. Extracts structured facility records (name, location, attributes), diffs against l
Demo · #05
BigCommerce Store Monitor
Production Python monitor that crawls a BigCommerce storefront's category pages on a schedule, detects inventory changes (new products, removed products, price
Demo · #06
Hacker News Monitor
Recurring monitor across Hacker News front page + newest + best feeds. Tracks every story, diffs score and comment_count between runs, fires structured alerts o
Demo · #07
Hugging Face Trending Monitor
Daily monitor across Hugging Face's trending models, datasets, and spaces via the public Hub API. Alerts on new entries, like surges, download spikes, and trend
Demo · #08
arXiv Papers Monitor
Daily monitor across arXiv submission categories (cs.AI / cs.LG / cs.CL — easily extended) via the public arXiv Atom API. Alerts on new submissions, paper revis
Demo · #09
RemoteOK Jobs Monitor
Hourly job-board monitor across RemoteOK's public JSON feed, filtered by tag (python, javascript, ai — easily extended). Alerts on new postings, salary updates,
Demo · #10
Shopify Storefront Monitor
Drop-in monitor for any public Shopify store via the universal /products.json endpoint every Shopify storefront exposes by default. Tracks per-variant price, co
Demo · #11
Substack & Newsletter Publication Monitor
Generic RSS 2.0 monitor for Substack publications, Medium pubs, Ghost blogs, WordPress feeds — anything with public RSS. Tracks per-post link, title, author, ca
Demo · #12
PDF Invoice Extractor
Production batch extractor that ingests a directory of invoice PDFs and produces two structured CSVs (per-line-item + per-invoice summary). Two-pass strategy: p
Demo · #13
Sitemap → JSON-LD Bulk Extractor
Two-stage pipeline mapping to 'scrape every X on this site' brief class. Stage 1: pull sitemap.xml (handles sitemap-index nesting), filter URLs by pattern. Stag
Demo · #14
Lead-Gen Contact Extractor
Take a list of company URLs → fetch homepage + auto-discovered contact/about/team/press/imprint pages → extract emails, phones, social handles (twitter, linkedi
Demo · #15
Wikipedia Infobox Bulk Extractor
Take a list of Wikipedia article titles → fetch via MediaWiki parse API → locate infobox table → flatten label/value rows to per-article CSV. Maps to 'extract t
Demo · #16
OpenStreetMap POI Bulk Extractor
Pull every point-of-interest of a given OSM tag (cafes / pharmacies / EV chargers / schools / clinics — any amenity, shop, or leisure tag) within a bounding box
Demo · #17
Paginated Catalog Scraper
Walks every page of a paginated listing (search results, e-com catalogs, real-estate listings, classifieds). Different from single-page monitors — iterates N pa
Demo · #18
PyPI Releases Monitor
Recurring monitor across PyPI's public RSS feeds — /rss/updates.xml (last 40 globally) + per-project /rss/project/<name>/releases.xml. Maps to dependency intel
GitHub Releases Tracker
Multi-repo GitHub release monitor via public REST API. Token-friendly: anonymous 60/h auto-upgrades to authenticated 5K/h with $GITHUB_TOKEN. Maps to multi-repo
Demo · #20
YouTube Channel Monitor
Track new uploads, view-count surges, rating changes across N YouTube channels via public RSS (/feeds/videos.xml?channel_id=UC...) — no API key, no quota cost.
Demo · #21
Producthunt Launches Monitor
Track Producthunt's daily launches via public Atom feed (/feed + /feed?category=tech + /feed?category=ai). Maps to launch tracking / startup intel / SaaS compet
Demo · #22
CVE / NVD Security Monitor
Track newly published CVEs via the NVD v2 API. Alerts on CVSS re-scoring (analysts revising severity), status transitions (Awaiting Analysis → Analyzed → Modifi
Demo · #23
Stack Exchange Q&A Monitor
Track new Stack Overflow / Stack Exchange questions by tag (170+ SE sites supported). Diff alerts on new questions, score deltas, view-count surges, is_answered
Demo · #24
Wayback Machine History Extractor
Extract historical snapshots of any URL via Wayback CDX API. Maps to 'what did this page look like in year X?', 'track competitor messaging over time', 'audit h
Demo · #25
SEC EDGAR Bulk Extractor
Built specifically against active Upwork brief ~022050416 (SEC EDGAR Extraction, US, fixed-price, 5-10 proposals, $700+ verified client, posted 2026-05-02) — no
Demo · #26
CoinGecko Market Monitor
Track top-N coins by market cap via CoinGecko's free public API. Alerts on price changes, rank shifts (coin enters/exits top-N), 24h % moves, ATH-distance. Acce
Demo · #27
NPM Registry Monitor
Per-package npm release monitor via public registry. Parallel to PyPI demo for JS ecosystem. Tracks latest 20 versions per package + dist-tags. Alerts on new ve
Demo · #28
Y Combinator Companies Bulk Extractor
Bulk-extract every YC company via public api.ycombinator.com endpoint. Real recurring Upwork brief class — VCs, sales-intel, recruitment, outbound platforms pos
Demo · #29
DEV.to Articles Monitor
Track new articles + edits across DEV.to per-tag feeds. Maps to dev community / content discovery / dev-tool brand monitoring briefs. Complements Substack #26 (
Demo · #30
GitHub Issues Multi-Repo Extractor
Track issues + PRs across N GitHub repos via public REST API. Diff alerts on state transitions (open → closed), comment surges, label updates, last-update drift
Demo · #31
Steam Catalog Bulk Extractor
Bulk extract Steam game metadata via public Store API. Real recurring Upwork brief class — gaming media, indie analytics, market research, recommender-system da
Demo · #32
Open Library Books Extractor
Bulk-extract book metadata from Open Library (IA's open catalog) — search.json + optional per-work enrichment. Maps to book-recommendation, library catalog, use
Demo · #33
HN Algolia Search Monitor
Watch entire HN history via Algolia search API for brand mentions, topic surges, old-thread re-discovery. Different from #21 (live feeds) — this is full-history
Demo · #34
App Store Metadata Bulk Extractor
Bulk-extract iOS app metadata via public iTunes Search + Lookup API. Real recurring Upwork brief class — mobile-app analytics, ASO consultancies, recommender da
Demo · #35
WHOIS / RDAP Bulk Domain Lookup
Bulk WHOIS-style lookup via RDAP (modern HTTPS+JSON WHOIS replacement). Maps to domain investor expiry tracking, infosec DNSSEC + NS audits, brand-protection ty
Demo · #36
Crates.io Rust Package Monitor
Per-crate Rust release monitor via public crates.io API. Parallel to PyPI #33 + npm #27 — completes package-registry trio across Python/JS/Rust. Tracks latest 2
Demo · #37
CKAN Open-Data Extractor
Bulk-extract dataset metadata from any CKAN-based open-data portal (data.gov.uk, data.gov.au, NYC OpenData, EU Open Data, 200+ municipal portals globally). All
Demo · #38
Open-Meteo Weather Extractor
Pull current + 7-day weather forecast for any list of cities via Open-Meteo (free, no key, no quota — DWD ICON / NOAA GFS / ECMWF IFS aggregated). Maps to weath
Demo · #39
PubMed Research Paper Extractor
Bulk-extract medical research papers via NCBI E-utilities (PubMed API). Maps to PubMed papers on topic/drug/disease briefs — medical research, pharma competitiv
Demo · #40
GitHub Org / User Bulk Repo Extractor
Bulk-extract every public repo from N orgs/users via GitHub REST API. Real recurring Upwork brief — dev-tool sales targeting, recruitment tech-stack profiling,
Demo · #41
Mastodon Hashtag Monitor
Track Mastodon public hashtag timelines across multiple instances. Maps to social listening / brand mention briefs — practical X/Twitter alternative now that X
Demo · #42
Crossref DOI Bulk Extractor
Extract academic publication metadata via Crossref (official DOI registry covering ~150M+ scholarly works). Maps to academic metadata / citation graph / journal
Demo · #43
WordPress Plugin Directory Bulk Extractor
Bulk-extract WP plugin metadata via WP.org plugin info API. Real recurring Upwork brief — WP agencies (competitive intel), SEO-tooling startups, plugin devs (TA
Demo · #44
Nominatim Bulk Geocoder
Bulk-geocode addresses (string → lat/lon + structured) via OSM Nominatim. Maps to geocode N addresses briefs — real estate, logistics, delivery routing, retail
Demo · #45
Dental IT MSP Lead Finder
Bulk lead-list builder for vertical-niche B2B prospecting. Sweeps DuckDuckGo + Bing organic results across 20 metros × 4 query variants, fetches each candidate
Demo · #46
Shopify Bookstore Lead Finder
Verifier-driven lead-finder for B2B vertical-niche prospecting. Two-stage pipeline: candidate sourcing from curated seed lists + Bing search → platform verifica