Eyal Rosenthal · Web scraping at scale

How to Scrape Reddit in 2026: Use the Official API (It's Cheap and the Workarounds Aren't)

How to Scrape Reddit in 2026

Short version: use the API, not scraping. Reddit's API is one of the friendlier ones on the public web, even after the 2023 changes.

This guide covers the API path (the right answer for 95% of use cases), what changed in 2023, and the narrow case for DIY scraping.

What changed in 2023

Reddit closed third-party app access (killing Apollo, Reddit Is Fun, etc.) and tightened ToS to forbid most scraping. Their official API is now the canonical path.

But: the official API is still free for personal/development use and remains generously priced for commercial use compared to Twitter/X's equivalent. Most people who ask "how do I scrape Reddit" are skipping past the API because they didn't realize it's still accessible.

Use PRAW — the Python Reddit API Wrapper, maintained by Reddit's community.

Setup (5 minutes)

  1. Go to reddit.com/prefs/apps
  2. Click "create app" → choose "script" type
  3. Name it whatever, redirect URL http://localhost:8080
  4. Copy the client_id (under the app name) and client_secret
pip install praw

Read 100 hot posts from a subreddit

import praw

reddit = praw.Reddit(
    client_id="your_client_id",
    client_secret="your_client_secret",
    user_agent="my-research-script/0.1 by u/your_username",
)

for submission in reddit.subreddit("webscraping").hot(limit=100):
    print(f"[{submission.score}] {submission.title}")
    print(f"  https://reddit.com{submission.permalink}")
    print(f"  {submission.url}")

That's the whole thing for browse-and-collect. PRAW handles auth, pagination, rate-limit-respect, retries.

Search across Reddit

for submission in reddit.subreddit("all").search("self-healing scraper", limit=50):
    print(submission.title, submission.subreddit_name_prefixed)

Get all comments on a post

submission = reddit.submission(id="abc123")
submission.comments.replace_more(limit=None)  # expand all "MoreComments"
for comment in submission.comments.list():
    print(f"[{comment.score}] {comment.body[:200]}")

Stream new posts in real-time

for submission in reddit.subreddit("webscraping").stream.submissions():
    print(f"NEW: {submission.title}")
    # do something — push to Slack, write to DB, etc.

What the API gives you (and what scraping won't easily)

  • Author info (submission.author, comment.author)
  • Vote counts (current state, not historical)
  • Awards
  • Crosspost relationships
  • Comment trees with full nesting
  • Real-time streams (no polling needed)

Rate limits

Reddit's free tier:

  • 60 requests per minute (roughly 100,000/day)
  • More than enough for monitoring 50-100 subreddits

If you exceed it, PRAW respects Retry-After headers automatically and backs off. You won't get banned at this tier; you'll just slow down.

For commercial high-volume use (>100k req/day), Reddit has paid API tiers (~$0.25 per 1k requests, similar to Twitter's pricing scheme).

What about Pushshift?

Pushshift.io used to be the canonical "search every Reddit comment ever" archive. Reddit cut their access in 2023; the public Pushshift API is dead.

Academic Pushshift access still exists but requires university affiliation and a research justification. If you have it, use it for historical search; otherwise the official API's 1000-result-deep search is your only option.

The narrow DIY case

The old.reddit.com path:

from curl_cffi import requests as cf
r = cf.get("https://old.reddit.com/r/webscraping/.json?limit=100",
           impersonate="chrome131",
           headers={"User-Agent": "research-script/0.1"})
data = r.json()
posts = data["data"]["children"]

Reddit serves a JSON view of any old.reddit.com URL by appending .json. No login required. Returns up to 100 items per request, paginated via after= parameter.

When this is useful:

  • You don't want to register an API client
  • You're doing a one-off lookup of a few subreddits
  • You're rate-limited under 1 request per 2 seconds

When it breaks:

  • Subreddits that require login (NSFW, private)
  • High volume — Reddit will rate-limit you, then ban your IP
  • Comment trees — much more complex via JSON than via PRAW
  • Anything beyond the most-recent-1000-items per source

Use the API instead.

Reddit's ToS allows automated access via their API. Direct HTTP scraping (the .json trick) is in a grey zone — technically allowed for personal use, technically forbidden by their broader API ToS at scale. Reddit hasn't pursued individual scrapers, but has cut off larger third parties (e.g., Pushshift).

The clear rule: use the API for anything you're going to deploy.

What to build with this

A few high-leverage Reddit-data ideas (all easy with PRAW):

  • Brand mention monitor — stream reddit.subreddit("all").stream.submissions() filtered by your brand keywords
  • Niche-community trend tracker — monitor 10-20 subreddits for emerging topics
  • Sentiment analysis — feed comments into Claude/GPT for theme extraction
  • Competitor product feedback — scrape r/product_recommendations, r/buyitforlife for product mentions
  • Community Q&A archive — back up valuable threads before they're deleted

If you have a Reddit-data use case you're not sure how to structure, send to info@luba.media. Most cases I can recommend a ready-made open-source pattern.

Hire me to build this for your site

I quote fixed-price and ship in 7-10 days. Send a brief to info@luba.media.

Send a brief