Eyal Rosenthal · Web scraping at scale

Crossref DOI Bulk Extractor

Crossref DOI Bulk Extractor — Academic Publication Metadata + Citation Counts

Crossref DOI Bulk Metadata Extractor

Extract academic publication metadata via Crossref (the official DOI registration agency for academic publishers). Maps to "extract metadata for [topic / DOI list]" briefs — universities, science publishers, citation-graph startups, journal index builders.

Built 2026-05-03 as Demo #42.

Run

. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/crossref_doi_extractor
python extract.py --queries "large language models,CRISPR Cas9,quantum supremacy" --rows 5
python extract.py --dois "10.1038/nature14539,10.1126/science.1259855"

Result

  • 15 records across 3 queries ✅
  • Per-row: title, year, authors, author_count, container (journal/book), publisher, type, citation count (is_referenced_by_count), reference count, ISSN, ISBN, language, license URL, DOI, URL ✅
  • Free, no API key (Crossref asks for User-Agent identification) ✅

Hire me to build this for your stack

Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.

info@luba.media