SEC EDGAR Bulk Extractor
SEC EDGAR Bulk Financial Filings Extractor
Built specifically against Upwork job ~022050416 "SEC EDGAR Extraction" (US, fixed-price, 5-10 proposals, $700+ verified client, posted 2026-05-02 on the freelance pipeline shortlist).
The brief class: take a list of tickers, pull most recent 10-K / 10-Q / 8-K filings + structured XBRL financial facts (revenue, net income, balance-sheet items), output a clean CSV.
Built 2026-05-03 as Demo #25 — the first hybrid-mode (real-Upwork-job-mapped) demo.
Run
. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/sec_edgar_extractor
python extract.py # uses tickers.txt
python extract.py --tickers AAPL,MSFT,NVDA,GOOGL # ad hoc
Result (10 mega-cap public companies)
- 10 / 10 tickers extracted, 0 failures ✅
- Per-ticker structured fields: company name, CIK, SIC code, industry, state of incorporation, fiscal year end ✅
- Latest annual XBRL facts (USD): Revenues, NetIncome, Assets, Liabilities, Equity, Cash ✅
- Latest 10-K + 10-Q + 8-K filing dates + accession numbers + direct URLs ✅
- 0.15s sleep between calls (well under SEC's 10 req/s rate limit) ✅
- Tenacity retries with exponential backoff (4-15s) ✅
Sample output:
| ticker | company | revenues | net income | 10-K filed |
|---|---|---|---|---|
| AAPL | Apple Inc. | $265.6B | $112.0B | 2025-10-31 |
| MSFT | Microsoft Corp | $62.5B | $101.8B | 2025-07-30 |
| NVDA | NVIDIA Corp | $215.9B | $120.1B | 2026-02-25 |
| GOOGL | Alphabet Inc. | $402.8B | $132.2B | 2026-02-05 |
| AMZN | Amazon.com | $716.9B | $77.7B | 2026-02-06 |
Why the multi-candidate XBRL field lookup
XBRL is a moving standard — companies report "revenue" under different tag names depending on adoption year and industry:
("Revenues", ["Revenues",
"RevenueFromContractWithCustomerExcludingAssessedTax",
"SalesRevenueNet"])
The extractor tries each candidate in order and uses the first that has FY data. This is the difference between a demo that works on AAPL and one that works on AAPL + biotechs + insurers + financial services.
Adapting to a different brief
- Different forms: edit
find_filing(submissions, "10-K")— also accepts10-Q,8-K,S-1,DEF 14A, etc. - More fields: append to
XBRL_FIELDS— common asks areOperatingCashFlow,CapitalExpenditures,LongTermDebt,EarningsPerShareBasic. - Quarterly instead of annual: change
fp == "FY"filter tofp.startswith("Q"). - Full text 10-K parsing: extend pipeline to fetch the primary document URL and parse with pdfplumber / BeautifulSoup for narrative sections (Risk Factors, MD&A).
SEC compliance
- Public APIs only:
data.sec.gov,www.sec.gov. No login, no auth. - Descriptive
User-Agentheader (SEC requires identification). - Rate limit respected: 0.15s sleep ≈ 6 req/s, well under SEC's 10 req/s cap.
Hire me to build this for your stack
Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.
info@luba.media