Intermediate 4 min read · Updated 2026-05-05

How to Scrape Twitter / X in 2026 (Honest: Don't, Use the API)

How to Scrape Twitter / X in 2026

Short version: don't. Use the official API.

This is the most-asked-about and least-recommended scraping target on the public web. Here's why, what the alternatives actually look like, and the narrow cases where DIY scraping might still be defensible.

What changed in 2023

Twitter/X did three things in 2023 that broke the whole DIY-scraping ecosystem:

Closed public read API access. Pre-2023, anyone with a free key could read tweets. Now: minimum $100/mo for the basic developer tier, $5,000/mo for full access.
Required login for almost every page. Search, profiles, individual tweets — all gate-walled. Anonymous scraping returns near-empty pages.
Aggressively litigated open-source scraping libraries. twint, snscrape and similar tools were broken (and their maintainers received C&D letters). The major maintained scrapers are now defunct or deprecated.

The result: every "How to scrape Twitter" tutorial older than mid-2023 is wrong.

What works in 2026 (approximate ranking)

Approach	Cost	Risk	Recommendation
Official X API (Basic tier)	$100-200/mo	None (legal, ToS-compliant)	Default for most use cases
Official X API (Pro tier)	$5,000/mo	None	Enterprise / high-volume only
Apify Twitter scraper actor	~$20-50/mo	Medium ToS, low legal	Fast prototype, low volume
Bright Data Twitter dataset	$100-1,000/mo	Low (their ToS, not yours)	Bulk historical data
DIY with logged-in session	Variable	High account-ban + ToS	Last resort, brittle

The official API path (recommended)

Twitter's developer API is what you should be using.

import tweepy

client = tweepy.Client(bearer_token="YOUR_TOKEN")

# Search recent tweets matching a query
tweets = client.search_recent_tweets(
    query="web scraping -is:retweet lang:en",
    max_results=100,
    tweet_fields=["created_at", "public_metrics", "author_id"],
)
for tweet in tweets.data or []:
    print(tweet.created_at, tweet.text[:120])

Pros: legal, well-documented, includes engagement metrics that scraping won't capture cleanly.

Cons: rate-limited (300-1500 requests per 15 min depending on tier), historical data is paywalled at $5k+/mo, search only goes back ~7 days on free/basic tiers.

What the API genuinely doesn't give you

A short list of cases where the API is a poor fit:

Historical search beyond 7 days on the basic tier (Pro tier or Twitter's Academic Research access handles this; both are gated)
Replies to a specific tweet in a clean structured way (the API returns conversation threads but it's awkward)
"View counts" on tweets that the web UI shows
Spaces transcripts (audio rooms)
Community posts (the new sub-Reddit-shaped feature)

For these, the alternatives are:

Apify's Twitter actor — community-maintained, ToS-grey, ~$20-50/mo for moderate use. Works today; expect breakage when X changes the front-end.
Bright Data Twitter dataset — pre-scraped, refreshed periodically. Their lawyers have figured out the ToS angle.
Direct partnership / X Pro account. If you're a real business, paying.

DIY scraping path (last resort, not recommended)

If you absolutely must roll your own:

# This is shown for educational completeness. Not recommended in production.
from playwright.sync_api import sync_playwright
import time

def scrape_user_tweets(username: str, login_cookies_path: str):
    with sync_playwright() as p:
        browser = p.chromium.launch_persistent_context(
            user_data_dir="./twitter_session",
            headless=True,
        )
        page = browser.new_page()
        # First-run: log in interactively, save the session.
        # Subsequent runs reuse it via persistent_context.
        page.goto(f"https://x.com/{username}")
        page.wait_for_selector("article[data-testid='tweet']", timeout=15_000)

        tweets = []
        # Scroll-load: X uses infinite scroll. Capture as we scroll.
        last_height = 0
        for _ in range(20):
            articles = page.query_selector_all("article[data-testid='tweet']")
            for art in articles:
                text_el = art.query_selector("div[data-testid='tweetText']")
                if text_el:
                    tweets.append({"text": text_el.inner_text()[:280]})
            # Scroll down
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(1.5)
            new_height = page.evaluate("document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height

        browser.close()
        return tweets

Why this is risky:

Your account gets shadowbanned or terminated if X detects the automation pattern
The DOM selectors change frequently — expect 1-2 days/month of maintenance
ToS violation is unambiguous; if X's lawyers wanted to escalate they could
Doesn't scale — you're rate-limited by what one logged-in browser can do

If you do this, use a throwaway account, not your real one.

Legal & ethical territory

Twitter/X is an explicit no-scrape zone post-2023. The risks:

ToS contract claim — you logged in and accepted terms; you're contractually bound.
Account termination — common, swift, no recourse.
CFAA / DMCA escalation — historically rare but not zero. Musk-era X has shown more litigation appetite than pre-acquisition Twitter.
Data redistribution — even if you scrape successfully, sharing/selling/republishing the data adds copyright + ToS risk.

"I want to monitor mentions of my brand." → X API Basic tier, search_recent_tweets() with brand keywords. $100/mo. Done.

"I want to track sentiment on a topic over time." → X API Basic tier + your own DB to accumulate the data. $100/mo + storage.

"I want historical Twitter data for a research project." → Apply for Academic Research access (free, requires university affiliation), or pay X Pro ($5,000/mo), or buy a pre-scraped dataset from Bright Data/Apify.

"I want to back up my own tweets." → X has a built-in archive download in account settings. No scraping needed.

"I want competitor-tweet engagement metrics." → X API public metrics include retweet/like/reply counts. Sufficient for most analysis.

"I want to extract leads from people who tweeted about my topic." → X API search → output list. Legitimate use case, fine on Basic tier.

What to read next

Web Scraping Legal & Ethics — the full landscape
Web Scraping Tools Comparison — when managed services earn their cost
Web Scraping FAQ — the 25 most-asked questions

If you're trying to build something Twitter-data-shaped and the API isn't fitting, send the use case to info@luba.media. Often there's a legitimate alternate path I can recommend (Bluesky / Mastodon / Reddit / niche-API solutions) that gives you the same outcome without the ToS minefield.

Hire me to build this for your site

I quote fixed-price and ship in 7-10 days. Send a brief to info@luba.media.

Send a brief