How to Scrape Indeed Job Listings in 2026
How to Scrape Indeed Job Listings in 2026
Indeed is the largest US job board and a frequent scraping target. They have Cloudflare Turnstile on most pages, anti-bot escalation at moderate volume, and no public API for non-employer use cases.
The honest verdict first
| Use case | Recommendation |
|---|---|
| Get all jobs at one company | Search Indeed manually, export the page (10 mins) |
| Daily monitoring of <100 search queries | DIY with curl_cffi + 2-5s jitter |
| Bulk scrape (>1k jobs/day) | Apify Indeed Scraper actor (~$50-100/mo) |
| Build a job-aggregator product | Combine multiple sources (Indeed + LinkedIn + Glassdoor + niche boards) — single-source aggregation is brittle |
| Real-time hiring signal | Subscribe to Indeed's official ATS partner API or use a service like LinkUp |
For most personal / research use cases, the DIY path works at low volume. For anything production-grade, use a managed service or partner API.
What Indeed serves vs hides
Indeed's pages have two layers:
- Initial HTML — search results page with job cards. Visible without JS, scrapable.
- Job detail pages — job description text, full company info. Requires Cloudflare Turnstile pass; some pages need JS execution.
For "list of jobs matching query [X] in city [Y]", the first layer is enough. For "full description text of job [Z]", you usually need a headless browser.
The minimal DIY scraper
import time
from random import uniform
from curl_cffi import requests as cf
from bs4 import BeautifulSoup
def indeed_search(query: str, location: str, n: int = 50) -> list[dict]:
base = "https://www.indeed.com/jobs"
results = []
start = 0
while len(results) < n:
params = {"q": query, "l": location, "start": start}
r = cf.get(base, params=params, impersonate="chrome131", timeout=20)
if r.status_code != 200:
print(f" blocked at start={start}")
break
soup = BeautifulSoup(r.text, "html.parser")
cards = soup.select("[data-jk]") # job-key attr on each result
if not cards:
break
for card in cards:
jk = card.get("data-jk")
title_el = card.select_one("h2 a span")
company_el = card.select_one("[data-testid='company-name']")
location_el = card.select_one("[data-testid='text-location']")
results.append({
"jk": jk,
"url": f"https://www.indeed.com/viewjob?jk={jk}",
"title": title_el.get_text(strip=True) if title_el else "",
"company": company_el.get_text(strip=True) if company_el else "",
"location": location_el.get_text(strip=True) if location_el else "",
})
start += 10
time.sleep(uniform(2.5, 5.0)) # heavy jitter — Indeed is sensitive
return results[:n]
for r in indeed_search("python developer", "remote", n=50):
print(r["title"], "—", r["company"], "—", r["location"])
Limits:
- ~10-30 search-page fetches before Cloudflare challenges
- Job description text not included — requires per-job-detail fetch
- CSS selectors /
data-testidattributes change every few months
Per-job detail extraction
def indeed_job_detail(jk: str) -> dict:
url = f"https://www.indeed.com/viewjob?jk={jk}"
r = cf.get(url, impersonate="chrome131", timeout=20)
if "challenges.cloudflare.com" in r.text:
raise RuntimeError("Cloudflare challenge — try residential proxy")
soup = BeautifulSoup(r.text, "html.parser")
return {
"jk": jk,
"description": soup.select_one("#jobDescriptionText").get_text(" ", strip=True)
if soup.select_one("#jobDescriptionText") else "",
"salary": soup.select_one("[id*='salary'] span").get_text(strip=True)
if soup.select_one("[id*='salary']") else None,
"job_type": soup.select_one("[data-testid='inlineHeader-jobType']").get_text(strip=True)
if soup.select_one("[data-testid='inlineHeader-jobType']") else None,
}
Hit each job's detail page with a 3-5 second delay. Maximum sustained rate: ~500-800 jobs/hour from a single residential IP before throttling.
At scale: managed alternatives
For >1,000 jobs/day, the math favors paying:
Apify Indeed Scraper actor
apify.com/misceres/indeed-scraper — ~$0.50-2 per 1k records depending on configuration. Handles anti-bot, CAPTCHAs, retries. Drop-in run, output to CSV/JSON.
Bright Data Indeed dataset
If you want pre-scraped recent listings rather than real-time, Bright Data sells a refreshed Indeed dataset starting around $150/mo.
LinkUp
For real-time hiring signal, LinkUp crawls company-website career pages directly (not Indeed). Different data, much more legitimate sourcing. Enterprise-priced.
ATS partner APIs
If you're building a product for employers, Greenhouse / Lever / Workable / SmartRecruiters all expose ATS data via partner APIs. Different audience but cleaner data.
What you should probably do instead
The most common Indeed-scraping briefs I see are:
- "Track new postings at companies X, Y, Z" — Better: monitor each company's careers page directly via
portfolio_demos/competitor_watch/pattern. Higher signal, lower legal risk, no anti-bot pain.
- "Get all software engineer jobs in [city]" — Apify actor or LinkUp. Don't DIY.
- "Build a job aggregator like Glassdoor" — combine Indeed + LinkedIn + Glassdoor + Otta + Remote OK + niche boards. Aggregating from one source is fragile.
- "Get job descriptions for analysis" — Use LinkUp's data feed or pay for Bright Data's pre-scraped corpus. Building this from Indeed will take weeks of maintenance.
Legal & ToS
Indeed's ToS forbids automated access without their official partnerships. They've issued cease-and-desist letters but rarely litigated individuals.
Practical rules:
- Personal / research use is the safest territory
- Don't republish job descriptions wholesale — copyright sits with the employer that posted them
- Don't compete with Indeed using their data
- Use ATS partner APIs for any product that touches commercial use
What to read next
- How to Scrape LinkedIn — the harder hiring-data problem
- Web Scraping Tools Comparison — when paid services are cheaper than DIY
- The repo:
portfolio_demos/remoteok_jobs_monitor/— same pattern for the cleaner RemoteOK target
If you have a job-listings brief, send to info@luba.media. I'll usually recommend a non-Indeed alternative source that's cheaper and more reliable.
Hire me to build this for your site
I quote fixed-price and ship in 7-10 days. Send a brief to info@luba.media.
Send a brief