Eyal Rosenthal · Web scraping at scale

Dental IT MSP Lead Finder

Dental IT MSP Lead Finder — Bulk-Extract + Tier-Score B2B Leads from Public Sources

Dental IT / MSP Lead Finder

Bulk lead-list builder for dental-vertical IT / MSP companies in the US + Canada. Sweeps DuckDuckGo organic results across 20 metro × 4 query-variant combinations, fetches each candidate's homepage, applies a dual-keyword filter (must mention BOTH dental AND IT-services terms), captures contact data, deduplicates by domain.

Run

pip install requests beautifulsoup4
python3 extract.py

Output: leads.csv with columns:

  • name — company / brand name (parsed from page / <h1>)</li><li><code>website</code> — final URL after redirects</li><li><code>country</code> — US / CA</li><li><code>region</code> — metro the lead was found in</li><li><code>services_blurb</code> — meta description or first body paragraph</li><li><code>contact_email</code> — mailto: link or first @ found in page</li><li><code>phone</code> — tel: link or US-pattern phone</li><li><code>contact_page</code> — link to company contact page if present</li><li><code>source_query</code> — which DDG query surfaced this lead</li></ul> <h2 id="sample-output">Sample output</h2> <p>See <code>leads.csv</code> (20+ leads, real, runnable today).</p> <h2 id="why-this-approach">Why this approach</h2> <ul><li><strong>No paid APIs</strong>: DuckDuckGo HTML search is free + has no captcha for normal pacing</li><li><strong>Dedup by registered domain</strong>: avoids duplicate listings across queries</li><li><strong>Dual-keyword filter</strong> (dental ∧ IT-services): cuts the false-positive rate to near-zero — generic IT shops or non-MSP dental sites get rejected at parse time</li><li><strong>Idempotent + scriptable</strong>: re-runnable, can be scheduled for a recurring lead-refresh pipeline</li></ul> <h2 id="extending">Extending</h2> <ul><li>Swap DDG for Google Custom Search ($5 / 1k queries) for better recall</li><li>Add Yelp + BBB + LinkedIn site-search as additional sources</li><li>Add quality-tier flag (Gold / Silver / Bronze) based on contact-data completeness</li><li>Add a Slack alert pipeline that posts new leads weekly</li></ul> <h2 id="real-upwork-brief-this-maps-to">Real Upwork brief this maps to</h2> <p>See <code>PROPOSAL.md</code> — written for a $200 fixed-price brief asking for 100 qualified dental IT / MSP companies in US + Canada.</p> </div> <div class="cta"> <h2>Hire me to build this for your stack</h2> <p>Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.</p> <a class="btn" href="mailto:info@luba.media">info@luba.media</a> </div> </div> </main> <footer> <div class="container"> <div class="footer-grid"> <div> <h4>One-stop shop for web scraping</h4> <p style="font-size:15px;line-height:1.5;color:var(--muted);margin:0 0 12px">46 production demos, 10 deep tutorials, written by someone who runs a €500K/yr data business in Spain on his own scrapers. Native English, async-only, fixed-price preferred.</p> <p style="font-size:15px;color:var(--muted)"><a href="mailto:info@luba.media" style="color:var(--accent)">info@luba.media</a></p> </div> <div> <h4>Site</h4> <ul> <li><a href="/demos/">Demos</a></li> <li><a href="/tutorials/">Tutorials</a></li> <li><a href="/tools/">Free tools</a></li> <li><a href="/resources/">Resources index</a></li> <li><a href="/about/">About</a></li> </ul> </div> <div> <h4>For machines</h4> <ul> <li><a href="/llms.txt">llms.txt</a></li> <li><a href="/sitemap.xml">sitemap.xml</a></li> <li><a href="/rss.xml">rss.xml</a></li> <li><a href="/robots.txt">robots.txt</a></li> <li><a href="https://github.com/luba-media/freelance">Source on GitHub</a></li> </ul> </div> </div> <div style="border-top:1px solid var(--rule);padding-top:24px;font-size:13px"> © Eyal Rosenthal 2026. Madrid, Spain. Hosted on Vercel. </div> </div> </footer> </body> </html>