How to Scrape Twitter / X in 2026 (Honest: Don't, Use the API)
How to Scrape Twitter / X in 2026
Short version: don't. Use the official API.
This is the most-asked-about and least-recommended scraping target on the public web. Here's why, what the alternatives actually look like, and the narrow cases where DIY scraping might still be defensible.
What changed in 2023
Twitter/X did three things in 2023 that broke the whole DIY-scraping ecosystem:
- Closed public read API access. Pre-2023, anyone with a free key could read tweets. Now: minimum $100/mo for the basic developer tier, $5,000/mo for full access.
- Required login for almost every page. Search, profiles, individual tweets — all gate-walled. Anonymous scraping returns near-empty pages.
- Aggressively litigated open-source scraping libraries.
twint,snscrapeand similar tools were broken (and their maintainers received C&D letters). The major maintained scrapers are now defunct or deprecated.
The result: every "How to scrape Twitter" tutorial older than mid-2023 is wrong.
What works in 2026 (approximate ranking)
| Approach | Cost | Risk | Recommendation |
|---|---|---|---|
| Official X API (Basic tier) | $100-200/mo | None (legal, ToS-compliant) | Default for most use cases |
| Official X API (Pro tier) | $5,000/mo | None | Enterprise / high-volume only |
| Apify Twitter scraper actor | ~$20-50/mo | Medium ToS, low legal | Fast prototype, low volume |
| Bright Data Twitter dataset | $100-1,000/mo | Low (their ToS, not yours) | Bulk historical data |
| DIY with logged-in session | Variable | High account-ban + ToS | Last resort, brittle |
The official API path (recommended)
Twitter's developer API is what you should be using.
import tweepy
client = tweepy.Client(bearer_token="YOUR_TOKEN")
# Search recent tweets matching a query
tweets = client.search_recent_tweets(
query="web scraping -is:retweet lang:en",
max_results=100,
tweet_fields=["created_at", "public_metrics", "author_id"],
)
for tweet in tweets.data or []:
print(tweet.created_at, tweet.text[:120])
Pros: legal, well-documented, includes engagement metrics that scraping won't capture cleanly.
Cons: rate-limited (300-1500 requests per 15 min depending on tier), historical data is paywalled at $5k+/mo, search only goes back ~7 days on free/basic tiers.
What the API genuinely doesn't give you
A short list of cases where the API is a poor fit:
- Historical search beyond 7 days on the basic tier (Pro tier or Twitter's Academic Research access handles this; both are gated)
- Replies to a specific tweet in a clean structured way (the API returns conversation threads but it's awkward)
- "View counts" on tweets that the web UI shows
- Spaces transcripts (audio rooms)
- Community posts (the new sub-Reddit-shaped feature)
For these, the alternatives are:
- Apify's Twitter actor — community-maintained, ToS-grey, ~$20-50/mo for moderate use. Works today; expect breakage when X changes the front-end.
- Bright Data Twitter dataset — pre-scraped, refreshed periodically. Their lawyers have figured out the ToS angle.
- Direct partnership / X Pro account. If you're a real business, paying.
DIY scraping path (last resort, not recommended)
If you absolutely must roll your own:
# This is shown for educational completeness. Not recommended in production.
from playwright.sync_api import sync_playwright
import time
def scrape_user_tweets(username: str, login_cookies_path: str):
with sync_playwright() as p:
browser = p.chromium.launch_persistent_context(
user_data_dir="./twitter_session",
headless=True,
)
page = browser.new_page()
# First-run: log in interactively, save the session.
# Subsequent runs reuse it via persistent_context.
page.goto(f"https://x.com/{username}")
page.wait_for_selector("article[data-testid='tweet']", timeout=15_000)
tweets = []
# Scroll-load: X uses infinite scroll. Capture as we scroll.
last_height = 0
for _ in range(20):
articles = page.query_selector_all("article[data-testid='tweet']")
for art in articles:
text_el = art.query_selector("div[data-testid='tweetText']")
if text_el:
tweets.append({"text": text_el.inner_text()[:280]})
# Scroll down
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(1.5)
new_height = page.evaluate("document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
browser.close()
return tweets
Why this is risky:
- Your account gets shadowbanned or terminated if X detects the automation pattern
- The DOM selectors change frequently — expect 1-2 days/month of maintenance
- ToS violation is unambiguous; if X's lawyers wanted to escalate they could
- Doesn't scale — you're rate-limited by what one logged-in browser can do
If you do this, use a throwaway account, not your real one.
Legal & ethical territory
Twitter/X is an explicit no-scrape zone post-2023. The risks:
- ToS contract claim — you logged in and accepted terms; you're contractually bound.
- Account termination — common, swift, no recourse.
- CFAA / DMCA escalation — historically rare but not zero. Musk-era X has shown more litigation appetite than pre-acquisition Twitter.
- Data redistribution — even if you scrape successfully, sharing/selling/republishing the data adds copyright + ToS risk.
What I'd recommend instead by use case
"I want to monitor mentions of my brand." → X API Basic tier, search_recent_tweets() with brand keywords. $100/mo. Done.
"I want to track sentiment on a topic over time." → X API Basic tier + your own DB to accumulate the data. $100/mo + storage.
"I want historical Twitter data for a research project." → Apply for Academic Research access (free, requires university affiliation), or pay X Pro ($5,000/mo), or buy a pre-scraped dataset from Bright Data/Apify.
"I want to back up my own tweets." → X has a built-in archive download in account settings. No scraping needed.
"I want competitor-tweet engagement metrics." → X API public metrics include retweet/like/reply counts. Sufficient for most analysis.
"I want to extract leads from people who tweeted about my topic." → X API search → output list. Legitimate use case, fine on Basic tier.
What to read next
- Web Scraping Legal & Ethics — the full landscape
- Web Scraping Tools Comparison — when managed services earn their cost
- Web Scraping FAQ — the 25 most-asked questions
If you're trying to build something Twitter-data-shaped and the API isn't fitting, send the use case to info@luba.media. Often there's a legitimate alternate path I can recommend (Bluesky / Mastodon / Reddit / niche-API solutions) that gives you the same outcome without the ToS minefield.
Hire me to build this for your site
I quote fixed-price and ship in 7-10 days. Send a brief to info@luba.media.
Send a brief