Web scraping
at real-business scale.

I run a €500K/year data business in Madrid on scrapers I built. Same code patterns ship to clients on Upwork and direct. Native English, async, fixed-price preferred, no calls.

46Production demos

40+Brief classes

€500KMy own data revenue

$0.0003Per-page AI extraction

What I'm known for

Self-healing AI extractors
20/20 records survive a full DOM scramble. The schema is the contract; CSS selectors aren't.
Anti-bot fluency, named
Cloudflare, DataDome, PerimeterX. curl_cffi, nodriver, residential rotation — every tool, every tradeoff.
Pipeline-as-product
State, alerts, observability. Turns one-shot scrapes into $300-1,000/mo retainer pipelines.
Multi-source orchestration
Parallel fan-out across HN + arXiv + HF + GitHub + PWC, normalized schema, per-source failure isolation.

Latest demos

Demo · #01

Self-Healing AI Web Extractor

A web extractor that does not break when sites redesign. Pages are converted to text and passed to an LLM with a strict JSON schema; the schema (not the markup)

Demo · #02

Real-Time Competitor Price Watch

Catalog-monitoring pipeline that snapshots a competitor's product list on a schedule, diffs against the last run, and posts a structured Slack alert the moment

Demo · #03

GitHub Trending Monitor

Daily monitor across GitHub's trending pages (Python / TypeScript / General). Alerts on new repos entering the trending list, star-count deltas, and language dr

Demo · #04

Government Facility Monitor

Drop-in monitor for any government / municipal / open-data wikitable listing. Extracts structured facility records (name, location, attributes), diffs against l

Demo · #05

BigCommerce Store Monitor

Production Python monitor that crawls a BigCommerce storefront's category pages on a schedule, detects inventory changes (new products, removed products, price

Demo · #06

Hacker News Monitor

Recurring monitor across Hacker News front page + newest + best feeds. Tracks every story, diffs score and comment_count between runs, fires structured alerts o

Demo · #07

Hugging Face Trending Monitor

Daily monitor across Hugging Face's trending models, datasets, and spaces via the public Hub API. Alerts on new entries, like surges, download spikes, and trend

Demo · #08

arXiv Papers Monitor

Daily monitor across arXiv submission categories (cs.AI / cs.LG / cs.CL — easily extended) via the public arXiv Atom API. Alerts on new submissions, paper revis

Demo · #09

RemoteOK Jobs Monitor

Hourly job-board monitor across RemoteOK's public JSON feed, filtered by tag (python, javascript, ai — easily extended). Alerts on new postings, salary updates,

All 46 demos →

Start here

Five plain-English guides for getting from zero to your first production scraper. Read in order.

7 min · Beginner

Getting Started with Web Scraping in 2026: From Zero to First Working Scraper in 30 Minutes

If you've never scraped a website before, start here. The minimum tools, the first working script, and the three things that will trip you up. Written for total beginners; no Python experience assumed.
3 min · Beginner

How to Scrape eBay Listings in 2026

eBay is the friendliest major e-commerce scraping target — light anti-bot, generous official API (5k requests/day free), and CSS structures that haven't drifted much in years. Here's the working stack.
4 min · Beginner

How to Scrape Reddit in 2026: Use the Official API (It's Cheap and the Workarounds Aren't)

Reddit closed public scraping in 2023 but kept the API affordable. PRAW + free OAuth tier handles 90% of use cases. The DIY scraping route exists but is brittle, ToS-risky, and unnecessary.
4 min · Beginner

How to Scrape Wikipedia (The Easy Target Everyone Overcomplicates)

Wikipedia is the simplest legitimate scraping target on the public internet. CC-BY-SA license, official APIs, no anti-bot. Here are the four ways to extract data and which to use when.
5 min · Beginner

How to Scrape YouTube Videos, Transcripts, and Channel Data in 2026

YouTube has three things you might want to extract: video files, transcripts, and metadata. Each has its own toolchain. yt-dlp + youtube-transcript-api + the Data API v3 cover 99% of use cases.

All tutorials →

Hire me to build the next one

Send a target site and the data you want. I'll send a fixed-price quote and a working sample within 24 hours.

info@luba.media

Web scrapingat real-business scale.