SEC EDGAR Bulk Extractor

SEC EDGAR Bulk Financial Filings Extractor

Built specifically against Upwork job ~022050416 "SEC EDGAR Extraction" (US, fixed-price, 5-10 proposals, $700+ verified client, posted 2026-05-02 on the freelance pipeline shortlist).

The brief class: take a list of tickers, pull most recent 10-K / 10-Q / 8-K filings + structured XBRL financial facts (revenue, net income, balance-sheet items), output a clean CSV.

Built 2026-05-03 as Demo #25 — the first hybrid-mode (real-Upwork-job-mapped) demo.

Run

. ~/freelance/.venv/bin/activate
cd ~/freelance/portfolio_demos/sec_edgar_extractor
python extract.py                                  # uses tickers.txt
python extract.py --tickers AAPL,MSFT,NVDA,GOOGL   # ad hoc

Result (10 mega-cap public companies)

10 / 10 tickers extracted, 0 failures ✅
Per-ticker structured fields: company name, CIK, SIC code, industry, state of incorporation, fiscal year end ✅
Latest annual XBRL facts (USD): Revenues, NetIncome, Assets, Liabilities, Equity, Cash ✅
Latest 10-K + 10-Q + 8-K filing dates + accession numbers + direct URLs ✅
0.15s sleep between calls (well under SEC's 10 req/s rate limit) ✅
Tenacity retries with exponential backoff (4-15s) ✅

Sample output:

ticker	company	revenues	net income	10-K filed
AAPL	Apple Inc.	$265.6B	$112.0B	2025-10-31
MSFT	Microsoft Corp	$62.5B	$101.8B	2025-07-30
NVDA	NVIDIA Corp	$215.9B	$120.1B	2026-02-25
GOOGL	Alphabet Inc.	$402.8B	$132.2B	2026-02-05
AMZN	Amazon.com	$716.9B	$77.7B	2026-02-06

Why the multi-candidate XBRL field lookup

XBRL is a moving standard — companies report "revenue" under different tag names depending on adoption year and industry:

("Revenues", ["Revenues",
              "RevenueFromContractWithCustomerExcludingAssessedTax",
              "SalesRevenueNet"])

The extractor tries each candidate in order and uses the first that has FY data. This is the difference between a demo that works on AAPL and one that works on AAPL + biotechs + insurers + financial services.

Adapting to a different brief

Different forms: edit find_filing(submissions, "10-K") — also accepts 10-Q, 8-K, S-1, DEF 14A, etc.
More fields: append to XBRL_FIELDS — common asks are OperatingCashFlow, CapitalExpenditures, LongTermDebt, EarningsPerShareBasic.
Quarterly instead of annual: change fp == "FY" filter to fp.startswith("Q").
Full text 10-K parsing: extend pipeline to fetch the primary document URL and parse with pdfplumber / BeautifulSoup for narrative sections (Risk Factors, MD&A).

SEC compliance

Public APIs only: data.sec.gov, www.sec.gov. No login, no auth.
Descriptive User-Agent header (SEC requires identification).
Rate limit respected: 0.15s sleep ≈ 6 req/s, well under SEC's 10 req/s cap.

Hire me to build this for your stack

Same patterns, your target site. Send the brief and I'll quote fixed-price within 24 hours.

info@luba.media