Eyal Rosenthal · Web scraping at scale

Web scraping
at real-business scale.

I run a €500K/year data business in Madrid on scrapers I built. Same code patterns ship to clients on Upwork and direct. Native English, async, fixed-price preferred, no calls.

46Production demos
40+Brief classes
€500KMy own data revenue
$0.0003Per-page AI extraction

What I'm known for

  1. Self-healing AI extractors

    20/20 records survive a full DOM scramble. The schema is the contract; CSS selectors aren't.

  2. Anti-bot fluency, named

    Cloudflare, DataDome, PerimeterX. curl_cffi, nodriver, residential rotation — every tool, every tradeoff.

  3. Pipeline-as-product

    State, alerts, observability. Turns one-shot scrapes into $300-1,000/mo retainer pipelines.

  4. Multi-source orchestration

    Parallel fan-out across HN + arXiv + HF + GitHub + PWC, normalized schema, per-source failure isolation.

Latest demos

Self-Healing AI Web Extractor — Survives DOM Changes That Break Every CSS Selector Demo · #01

Self-Healing AI Web Extractor

A web extractor that does not break when sites redesign. Pages are converted to text and passed to an LLM with a strict JSON schema; the schema (not the markup)

Real-Time Competitor Price Watch — Slack Alert in Under 60s on Any Price or Stock Change Demo · #02

Real-Time Competitor Price Watch

Catalog-monitoring pipeline that snapshots a competitor's product list on a schedule, diffs against the last run, and posts a structured Slack alert the moment

GitHub Trending Monitor — Daily Tech-Stack Intelligence with Viral-Repo Alerts Demo · #03

GitHub Trending Monitor

Daily monitor across GitHub's trending pages (Python / TypeScript / General). Alerts on new repos entering the trending list, star-count deltas, and language dr

Government Facility Monitor — Schema-Aware Wikitable Scraper with Diff Alerts Demo · #04

Government Facility Monitor

Drop-in monitor for any government / municipal / open-data wikitable listing. Extracts structured facility records (name, location, attributes), diffs against l

BigCommerce Store Monitor — Twice-Daily Inventory Crawl with Email Change Reports Demo · #05

BigCommerce Store Monitor

Production Python monitor that crawls a BigCommerce storefront's category pages on a schedule, detects inventory changes (new products, removed products, price

Hacker News Monitor — Multi-Feed Score & Comment Tracking with Viral-Story Alerts Demo · #06

Hacker News Monitor

Recurring monitor across Hacker News front page + newest + best feeds. Tracks every story, diffs score and comment_count between runs, fires structured alerts o

Hugging Face Trending Monitor — Daily Model Release Alerts via Hub API Demo · #07

Hugging Face Trending Monitor

Daily monitor across Hugging Face's trending models, datasets, and spaces via the public Hub API. Alerts on new entries, like surges, download spikes, and trend

arXiv Papers Monitor — Multi-Category Research Alerts via Atom API Demo · #08

arXiv Papers Monitor

Daily monitor across arXiv submission categories (cs.AI / cs.LG / cs.CL — easily extended) via the public arXiv Atom API. Alerts on new submissions, paper revis

RemoteOK Jobs Monitor — Tag-Filtered Hiring-Signal Alerts via Public JSON API Demo · #09

RemoteOK Jobs Monitor

Hourly job-board monitor across RemoteOK's public JSON feed, filtered by tag (python, javascript, ai — easily extended). Alerts on new postings, salary updates,

All 46 demos →

Start here

Five plain-English guides for getting from zero to your first production scraper. Read in order.

All tutorials →

Hire me to build the next one

Send a target site and the data you want. I'll send a fixed-price quote and a working sample within 24 hours.

info@luba.media