Eyal Rosenthal · Web scraping at scale

Shopify Bookstore Lead Finder

Shopify Bookstore Lead Finder — Platform-Verified Leads via /products.json

Shopify Bookstore Lead Finder

Two-stage pipeline that finds verified Shopify-hosted bookstores from a candidate list. Uses Shopify's /products.json endpoint (Shopify-only, returns catalog as JSON) to verify the platform, then applies a book-shape filter (ISBN-pattern SKUs + book/author/poetry/textbook keywords) to confirm the catalog.

Run

pip install requests beautifulsoup4 curl_cffi
python3 extract.py

Output: leads.csv (raw verified leads) and leads_qa.csv (US-filtered + tier-scored).

How it works

Stage 1 — Candidate sourcing. Combines a curated seed list of indie/specialty bookstore URLs with Bing search queries. The verifier handles the false-positive removal, so the candidate list can be noisy.

Stage 2 — Shopify verification. For each candidate, fetch /products.json?limit=50. This endpoint exists only on Shopify storefronts. If it returns a valid JSON payload with a products array, the domain is confirmed Shopify. Otherwise reject.

Stage 3 — Bookstore filter. Apply two heuristics to the catalog:

  • ≥30% of sampled products contain book-shaped keywords (book, author, ISBN, paperback, novel, poetry, textbook, memoir, etc.)
  • OR ≥3 products with ISBN-pattern SKUs (matching ^(97[89])?\d{9,12}$)

Either signal qualifies. This rejects gift shops with a few books while accepting comic-book stores, RPG/tabletop publishers, used-book sellers, etc.

Stage 4 — Contact enrichment. Hit /, /pages/contact, /pages/about. Extract: company name, email (mailto: links + visible @), phone (tel: + US-pattern), country (footer hints + ZIP-code presence), services blurb (meta description / first paragraph).

Stage 5 — Tier scoring.

  • Gold = full contact (email + phone) + ISBN-validated catalog
  • Silver = email OR phone
  • Bronze = URL only

Why this is better than hand-curated lists

The interesting finding from the demo run: of 50 well-known US indie bookstores I tested as a seed list, only ONE (Tattered Cover) was on Shopify. Most run on IndieCommerce (the American Booksellers Association platform), Squarespace, or custom builds.

Without programmatic verification, ~98% of "Shopify bookstore" lists hand-built from Google results are false positives. The verifier fixes that by going to the platform layer instead of the search-result layer.

Real Upwork brief this maps to

See PROPOSAL.md — written for a $100 fixed-price brief asking for verified US Shopify bookstores with contact enrichment.

Sample output

leads_qa.csv contains the verified Tattered Cover record from the seed run (50 products, 48 ISBN-pattern SKUs in catalog). Production version would source candidates from the Shopify Stores Directory + Bookshop.org affiliate list + niche bookstore directories — typically yields 50-150 verified bookstores.

Hire me to build this for your stack

Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.

info@luba.media