Eyal Rosenthal · Web scraping at scale

How to Scrape Twitter / X in 2026 (Honest: Don't, Use the API)

How to Scrape Twitter / X in 2026

Short version: don't. Use the official API.

This is the most-asked-about and least-recommended scraping target on the public web. Here's why, what the alternatives actually look like, and the narrow cases where DIY scraping might still be defensible.

What changed in 2023

Twitter/X did three things in 2023 that broke the whole DIY-scraping ecosystem:

  1. Closed public read API access. Pre-2023, anyone with a free key could read tweets. Now: minimum $100/mo for the basic developer tier, $5,000/mo for full access.
  2. Required login for almost every page. Search, profiles, individual tweets — all gate-walled. Anonymous scraping returns near-empty pages.
  3. Aggressively litigated open-source scraping libraries. twint, snscrape and similar tools were broken (and their maintainers received C&D letters). The major maintained scrapers are now defunct or deprecated.

The result: every "How to scrape Twitter" tutorial older than mid-2023 is wrong.

What works in 2026 (approximate ranking)

ApproachCostRiskRecommendation
Official X API (Basic tier)$100-200/moNone (legal, ToS-compliant)Default for most use cases
Official X API (Pro tier)$5,000/moNoneEnterprise / high-volume only
Apify Twitter scraper actor~$20-50/moMedium ToS, low legalFast prototype, low volume
Bright Data Twitter dataset$100-1,000/moLow (their ToS, not yours)Bulk historical data
DIY with logged-in sessionVariableHigh account-ban + ToSLast resort, brittle

Twitter's developer API is what you should be using.

import tweepy

client = tweepy.Client(bearer_token="YOUR_TOKEN")

# Search recent tweets matching a query
tweets = client.search_recent_tweets(
    query="web scraping -is:retweet lang:en",
    max_results=100,
    tweet_fields=["created_at", "public_metrics", "author_id"],
)
for tweet in tweets.data or []:
    print(tweet.created_at, tweet.text[:120])

Pros: legal, well-documented, includes engagement metrics that scraping won't capture cleanly.

Cons: rate-limited (300-1500 requests per 15 min depending on tier), historical data is paywalled at $5k+/mo, search only goes back ~7 days on free/basic tiers.

What the API genuinely doesn't give you

A short list of cases where the API is a poor fit:

  • Historical search beyond 7 days on the basic tier (Pro tier or Twitter's Academic Research access handles this; both are gated)
  • Replies to a specific tweet in a clean structured way (the API returns conversation threads but it's awkward)
  • "View counts" on tweets that the web UI shows
  • Spaces transcripts (audio rooms)
  • Community posts (the new sub-Reddit-shaped feature)

For these, the alternatives are:

  1. Apify's Twitter actor — community-maintained, ToS-grey, ~$20-50/mo for moderate use. Works today; expect breakage when X changes the front-end.
  2. Bright Data Twitter dataset — pre-scraped, refreshed periodically. Their lawyers have figured out the ToS angle.
  3. Direct partnership / X Pro account. If you're a real business, paying.

If you absolutely must roll your own:

# This is shown for educational completeness. Not recommended in production.
from playwright.sync_api import sync_playwright
import time

def scrape_user_tweets(username: str, login_cookies_path: str):
    with sync_playwright() as p:
        browser = p.chromium.launch_persistent_context(
            user_data_dir="./twitter_session",
            headless=True,
        )
        page = browser.new_page()
        # First-run: log in interactively, save the session.
        # Subsequent runs reuse it via persistent_context.
        page.goto(f"https://x.com/{username}")
        page.wait_for_selector("article[data-testid='tweet']", timeout=15_000)

        tweets = []
        # Scroll-load: X uses infinite scroll. Capture as we scroll.
        last_height = 0
        for _ in range(20):
            articles = page.query_selector_all("article[data-testid='tweet']")
            for art in articles:
                text_el = art.query_selector("div[data-testid='tweetText']")
                if text_el:
                    tweets.append({"text": text_el.inner_text()[:280]})
            # Scroll down
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(1.5)
            new_height = page.evaluate("document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height

        browser.close()
        return tweets

Why this is risky:

  • Your account gets shadowbanned or terminated if X detects the automation pattern
  • The DOM selectors change frequently — expect 1-2 days/month of maintenance
  • ToS violation is unambiguous; if X's lawyers wanted to escalate they could
  • Doesn't scale — you're rate-limited by what one logged-in browser can do

If you do this, use a throwaway account, not your real one.

Twitter/X is an explicit no-scrape zone post-2023. The risks:

  1. ToS contract claim — you logged in and accepted terms; you're contractually bound.
  2. Account termination — common, swift, no recourse.
  3. CFAA / DMCA escalation — historically rare but not zero. Musk-era X has shown more litigation appetite than pre-acquisition Twitter.
  4. Data redistribution — even if you scrape successfully, sharing/selling/republishing the data adds copyright + ToS risk.

What I'd recommend instead by use case

"I want to monitor mentions of my brand." → X API Basic tier, search_recent_tweets() with brand keywords. $100/mo. Done.

"I want to track sentiment on a topic over time." → X API Basic tier + your own DB to accumulate the data. $100/mo + storage.

"I want historical Twitter data for a research project." → Apply for Academic Research access (free, requires university affiliation), or pay X Pro ($5,000/mo), or buy a pre-scraped dataset from Bright Data/Apify.

"I want to back up my own tweets." → X has a built-in archive download in account settings. No scraping needed.

"I want competitor-tweet engagement metrics." → X API public metrics include retweet/like/reply counts. Sufficient for most analysis.

"I want to extract leads from people who tweeted about my topic." → X API search → output list. Legitimate use case, fine on Basic tier.

If you're trying to build something Twitter-data-shaped and the API isn't fitting, send the use case to info@luba.media. Often there's a legitimate alternate path I can recommend (Bluesky / Mastodon / Reddit / niche-API solutions) that gives you the same outcome without the ToS minefield.

Hire me to build this for your site

I quote fixed-price and ship in 7-10 days. Send a brief to info@luba.media.

Send a brief