Eyal Rosenthal · Web scraping at scale

How to Scrape Indeed Job Listings in 2026

How to Scrape Indeed Job Listings in 2026

Indeed is the largest US job board and a frequent scraping target. They have Cloudflare Turnstile on most pages, anti-bot escalation at moderate volume, and no public API for non-employer use cases.

The honest verdict first

Use caseRecommendation
Get all jobs at one companySearch Indeed manually, export the page (10 mins)
Daily monitoring of <100 search queriesDIY with curl_cffi + 2-5s jitter
Bulk scrape (>1k jobs/day)Apify Indeed Scraper actor (~$50-100/mo)
Build a job-aggregator productCombine multiple sources (Indeed + LinkedIn + Glassdoor + niche boards) — single-source aggregation is brittle
Real-time hiring signalSubscribe to Indeed's official ATS partner API or use a service like LinkUp

For most personal / research use cases, the DIY path works at low volume. For anything production-grade, use a managed service or partner API.

What Indeed serves vs hides

Indeed's pages have two layers:

  1. Initial HTML — search results page with job cards. Visible without JS, scrapable.
  2. Job detail pages — job description text, full company info. Requires Cloudflare Turnstile pass; some pages need JS execution.

For "list of jobs matching query [X] in city [Y]", the first layer is enough. For "full description text of job [Z]", you usually need a headless browser.

The minimal DIY scraper

import time
from random import uniform
from curl_cffi import requests as cf
from bs4 import BeautifulSoup

def indeed_search(query: str, location: str, n: int = 50) -> list[dict]:
    base = "https://www.indeed.com/jobs"
    results = []
    start = 0
    while len(results) < n:
        params = {"q": query, "l": location, "start": start}
        r = cf.get(base, params=params, impersonate="chrome131", timeout=20)
        if r.status_code != 200:
            print(f"  blocked at start={start}")
            break

        soup = BeautifulSoup(r.text, "html.parser")
        cards = soup.select("[data-jk]")  # job-key attr on each result
        if not cards:
            break

        for card in cards:
            jk = card.get("data-jk")
            title_el = card.select_one("h2 a span")
            company_el = card.select_one("[data-testid='company-name']")
            location_el = card.select_one("[data-testid='text-location']")
            results.append({
                "jk": jk,
                "url": f"https://www.indeed.com/viewjob?jk={jk}",
                "title": title_el.get_text(strip=True) if title_el else "",
                "company": company_el.get_text(strip=True) if company_el else "",
                "location": location_el.get_text(strip=True) if location_el else "",
            })

        start += 10
        time.sleep(uniform(2.5, 5.0))  # heavy jitter — Indeed is sensitive

    return results[:n]

for r in indeed_search("python developer", "remote", n=50):
    print(r["title"], "—", r["company"], "—", r["location"])

Limits:

  • ~10-30 search-page fetches before Cloudflare challenges
  • Job description text not included — requires per-job-detail fetch
  • CSS selectors / data-testid attributes change every few months

Per-job detail extraction

def indeed_job_detail(jk: str) -> dict:
    url = f"https://www.indeed.com/viewjob?jk={jk}"
    r = cf.get(url, impersonate="chrome131", timeout=20)
    if "challenges.cloudflare.com" in r.text:
        raise RuntimeError("Cloudflare challenge — try residential proxy")

    soup = BeautifulSoup(r.text, "html.parser")
    return {
        "jk": jk,
        "description": soup.select_one("#jobDescriptionText").get_text(" ", strip=True)
                       if soup.select_one("#jobDescriptionText") else "",
        "salary": soup.select_one("[id*='salary'] span").get_text(strip=True)
                  if soup.select_one("[id*='salary']") else None,
        "job_type": soup.select_one("[data-testid='inlineHeader-jobType']").get_text(strip=True)
                    if soup.select_one("[data-testid='inlineHeader-jobType']") else None,
    }

Hit each job's detail page with a 3-5 second delay. Maximum sustained rate: ~500-800 jobs/hour from a single residential IP before throttling.

At scale: managed alternatives

For >1,000 jobs/day, the math favors paying:

Apify Indeed Scraper actor

apify.com/misceres/indeed-scraper — ~$0.50-2 per 1k records depending on configuration. Handles anti-bot, CAPTCHAs, retries. Drop-in run, output to CSV/JSON.

Bright Data Indeed dataset

If you want pre-scraped recent listings rather than real-time, Bright Data sells a refreshed Indeed dataset starting around $150/mo.

LinkUp

For real-time hiring signal, LinkUp crawls company-website career pages directly (not Indeed). Different data, much more legitimate sourcing. Enterprise-priced.

ATS partner APIs

If you're building a product for employers, Greenhouse / Lever / Workable / SmartRecruiters all expose ATS data via partner APIs. Different audience but cleaner data.

What you should probably do instead

The most common Indeed-scraping briefs I see are:

  1. "Track new postings at companies X, Y, Z" — Better: monitor each company's careers page directly via portfolio_demos/competitor_watch/ pattern. Higher signal, lower legal risk, no anti-bot pain.
  1. "Get all software engineer jobs in [city]" — Apify actor or LinkUp. Don't DIY.
  1. "Build a job aggregator like Glassdoor" — combine Indeed + LinkedIn + Glassdoor + Otta + Remote OK + niche boards. Aggregating from one source is fragile.
  1. "Get job descriptions for analysis" — Use LinkUp's data feed or pay for Bright Data's pre-scraped corpus. Building this from Indeed will take weeks of maintenance.

Indeed's ToS forbids automated access without their official partnerships. They've issued cease-and-desist letters but rarely litigated individuals.

Practical rules:

  1. Personal / research use is the safest territory
  2. Don't republish job descriptions wholesale — copyright sits with the employer that posted them
  3. Don't compete with Indeed using their data
  4. Use ATS partner APIs for any product that touches commercial use

If you have a job-listings brief, send to info@luba.media. I'll usually recommend a non-Indeed alternative source that's cheaper and more reliable.

Hire me to build this for your site

I quote fixed-price and ship in 7-10 days. Send a brief to info@luba.media.

Send a brief