Is Web Scraping Legal in the US? What the Law Actually Says

A few weeks ago, a colleague on our sales team asked me a question I hear constantly: "Can we scrape leads from this public business directory, or will we get sued?" He'd found a goldmine of prospect data sitting right there on the open web — no login, no paywall — but a quick Google search had him convinced he might end up in handcuffs.

That kind of anxiety is everywhere. Automated traffic now accounts for roughly 51% of all web traffic, the web scraping software market is projected to grow from about $1.08 billion in 2025 to $3.59 billion by 2031, and yet most of the legal guidance floating around online is either outdated, oversimplified, or flat-out wrong. The hiQ v. LinkedIn case from 2022? Nearly every article treats it like a Supreme Court ruling that "all scraping is legal." (Spoiler: it isn't, and it wasn't.)

Meanwhile, major new cases in 2024 and 2025 — involving X (formerly Twitter), Meta, Reddit, Google, and AI companies — are actively reshaping the rules, and almost nobody is covering them. This guide covers what U.S. law actually says about web scraping in 2026, separates the myths from reality, and gives you a practical framework for figuring out what you can and can't do.

ig_01ef7eecb01f4f920169f063829a4481918da7ee0e1b3f672e_compressed.webp

What Is Web Scraping (And Why Do Businesses Care)?

Web scraping is using automated software to collect information from websites and organize it into structured data — think spreadsheets, databases, or CRM records.

More precisely, a scraper visits web pages, reads the underlying HTML, and pulls out specific data points — prices, names, addresses, product specs, whatever you need — into neat rows and columns. It's the digital equivalent of hiring someone to copy information from a website into Excel, except a bot does it in seconds instead of hours.

Web scraping is NOT hacking. It accesses the same information any visitor would see in their browser.

And it's not some niche developer trick. Search engines, price comparison sites, real estate platforms, market research dashboards, and AI-powered tools all rely on web crawling and scraping to function. If you've ever used Google, checked a flight aggregator, or browsed Zillow, you've benefited from scraping.

The most common business use cases I run into:

Lead generation: Extracting company names, websites, job titles, or public contact details from business directories.
Competitor price monitoring: Ecommerce teams tracking rival SKU prices, availability, and shipping info.
Real estate intelligence: Aggregating public property listings, prices, and market trends.
Product research: Pulling product specs, ratings, availability, and category data from retail sites.
Market intelligence: Tracking job postings, store openings, news signals, or public financial data.

The technique itself is neutral. The legal analysis turns on how you access the data and what you do with it afterward.

Is Web Scraping Legal in the US? The Short Answer

There is no U.S. federal law that bans web scraping outright. Scraping publicly available data is generally permitted.

But — and this is a big but — legality depends on several factors: the type of data, how you access it, whether you agreed to any terms of service, whether the data includes personal information, and what you plan to do with it.

The biggest source of confusion in forums, Reddit threads, and even legal blogs? People conflate "illegal" with "against a website's terms of service." These are very different things. Breaking a website's rules might get your IP blocked or your account banned. Breaking a federal law could mean a lawsuit or, in rare cases, criminal prosecution. Most scraping consequences fall squarely in the civil category.

The rest of this article unpacks the key laws, the landmark court cases (including ones from 2024 and 2025 that almost nobody covers), and a practical decision framework you can actually use.

The Three Types of "Illegal": Criminal, Civil, and ToS Violations

Time to clear up the single biggest misconception about web scraping law. When someone asks "is web scraping illegal?", they're usually lumping together three completely different categories of risk. Separating them changes the whole conversation.

ig_01ef7eecb01f4f920169f064039ff881918c7bf5b1db31fa7f_compressed.webp

Type of Liability	What Triggers It	Potential Consequence	Severity
Criminal (CFAA)	Accessing data behind authentication barriers without authorization, fraud, credential misuse	Federal prosecution, fines, imprisonment	🔴 Severe — but extremely rare for ordinary business scraping
Civil lawsuit	Copyright infringement, trespass to chattels, breach of contract, trade secret misappropriation, privacy violations	Monetary damages, injunctions, data deletion	🟡 Significant
ToS violation	Breaching browsewrap or clickwrap terms of service	Account termination, IP blocking, cease-and-desist, possible civil suit	🟢 Low to moderate

The Department of Justice's 2022 CFAA charging policy explicitly states that ordinary terms-of-service violations — like creating a fake account or violating website rules — are not by themselves sufficient for federal criminal charges. That's a big deal.

The practical takeaway: if you're a sales team scraping public business listings or an ecommerce team monitoring competitor prices, you're almost certainly looking at civil risk management, not criminal exposure. That doesn't mean you can ignore the rules, but it should recalibrate your anxiety level.

The Key US Laws That Apply to Web Scraping

Four legal pillars intersect with web scraping in the U.S., and each one addresses a different piece of the puzzle.

The Computer Fraud and Abuse Act (CFAA)

The CFAA (18 U.S.C. § 1030) was originally written to prosecute computer hacking. Over the years, it became the go-to statute for scraping lawsuits, usually under the theory that a scraper accessed a website "without authorization."

Then came Van Buren v. United States. The Supreme Court held that a person "exceeds authorized access" under the CFAA only when they access areas of a computer — files, folders, databases — that are off-limits to them. Simply misusing information you're otherwise allowed to see doesn't count.

Scraping implications:

Lower CFAA risk: Public web pages available to anyone without a login. No gate, no "unauthorized access" problem.
Higher CFAA risk: Data behind logins, paywalls, access tokens, session manipulation, or revoked access.

The hiQ v. LinkedIn case (which we'll dissect in detail below) reinforced this for public data. But the CFAA is only one piece of the puzzle.

Copyright Law and the DMCA

U.S. copyright law protects original creative expression — articles, photos, videos, creative product descriptions — but not raw facts. The Supreme Court's Feist decision is the landmark case here: facts like names, addresses, and phone numbers are not copyrightable, no matter how much effort went into compiling them.

Risk tiers for scraped data:

What You're Scraping	Copyright Risk	Why
Prices, product names, addresses, dates, specs	Lower	These are facts
Full articles, photos, videos, creative reviews	Higher	These are expressive works
Curated databases, rankings, editorial taxonomies	Medium-high	Selection and arrangement may be protected
Paywalled or DRM-protected content	High	Copyright plus access-control issues

The DMCA's anti-circumvention provision (17 U.S.C. § 1201) adds another layer: bypassing technical protection measures (paywalls, DRM, certain anti-bot systems) to access copyrighted content can trigger liability even if you never copy the content itself. This is being tested aggressively in 2025-2026 cases, including Google v. SerpApi, where Google alleges DMCA violations for circumventing its SearchGuard anti-bot system.

Fair use matters too — transformative use (analyzing, aggregating, or building on data rather than just republishing it) is generally safer than copying and reposting someone else's content.

Contract Law: Terms of Service (Browsewrap vs. Clickwrap)

Many websites include anti-scraping language in their terms of service — but enforceability depends entirely on how you encountered those terms.

Contract Type	Enforceability	What It Means for Scrapers
Clickwrap (you click "I agree")	Strong	Courts consistently enforce these. Anti-scraping terms can support civil claims.
Sign-in wrap (notice near login)	Fact-specific	Depends on how conspicuous the notice was.
Browsewrap (linked in footer)	Weaker	Courts are skeptical when users had no real notice.
Account/API terms	Stronger	Logged-in scraping or API misuse is much higher risk.

In Meta v. Bright Data (2024), the court found that Meta's terms didn't cover logged-out public scraping in the way Meta argued — Bright Data hadn't been shown to use logged-in accounts for the public scraping at issue. That's a meaningful distinction.

Practical advice: if you never logged in, never clicked "I agree," and are scraping only public pages, browsewrap restrictions are harder for a website to enforce against you. But always check the ToS before scraping, especially if you've created an account.

US State Privacy Laws (CCPA and Beyond)

If the data you're scraping includes personal information — names, emails, phone numbers, location data — state privacy laws may apply. And the patchwork is growing fast. The IAPP counted 19 enacted comprehensive state privacy laws by mid-2025, and MultiState reported 20 states with comprehensive privacy laws in effect in 2026.

Most of these laws include exceptions for "publicly available" personal information, but the definitions vary. And downstream use — selling, sharing, or profiling with that data — can still trigger obligations even if the initial collection is exempt.

State Law	Effective	Covers Scraped PII?	Opt-Out Requirement	Penalty Range
CCPA/CPRA (California)	2020/2023	Yes	Sale/sharing opt-out; GPC recognized	$2,663–$7,988/violation (2025 adjusted)
CPA (Colorado)	2023	Yes	Universal opt-out/GPC from July 2024	Civil penalties under deceptive trade practice framework
CTDPA (Connecticut)	2023	Yes	OOPS/GPC from Jan. 2025	Up to $5,000/willful violation
VCDPA (Virginia)	2023	Yes	Opt-out right	Up to $7,500/violation
TDPSA (Texas)	2024	Yes	Universal opt-out from Jan. 2025	Up to $7,500/violation
+ 8 more enacted through 2026	Varies	Varies	Varies	Varies

Additional states with enacted laws include Utah, Oregon, Montana, Delaware, Iowa, Nebraska, New Hampshire, New Jersey, Tennessee, Minnesota, Maryland, Indiana, Kentucky, and Rhode Island. Alabama enacted a law effective May 1, 2027.

For business users scraping product prices, business listings, or market data — non-PII, factual information — privacy risk is substantially lower. Tools like Thunderbit focus on structured extraction from public pages (product data, business directories, real estate listings), which aligns with the lowest-risk scraping category.

Landmark Web Scraping Cases: A Timeline from 2000 to 2026

This is where I think most guides on this topic fall short. Nearly every article stops at hiQ v. LinkedIn (2022) and ignores the rulings that are actively shaping scraping law right now. Here's the full timeline:

Case	Year	Key Holding	Impact on Scrapers
eBay v. Bidder's Edge	2000	Preliminary injunction under trespass to chattels; crawler burden on servers mattered	⚠️ High-volume scraping that burdens servers can create civil liability
Facebook v. Power Ventures	2016	CFAA liability after cease-and-desist and continued access using Facebook systems	⚠️ C&D plus authenticated/gated access is high risk
Van Buren v. US	2021	CFAA "exceeds authorized access" requires accessing off-limits computer areas	✅ Narrowed CFAA scope significantly
hiQ v. LinkedIn	2022	Accessing public data not a CFAA violation (preliminary injunction, later settled)	✅ Public data ≠ "unauthorized access" — but not a final ruling
Meta v. Bright Data	2024	Bright Data won summary judgment on Meta's contract theory for logged-out public scraping	✅ Terms may not bind logged-out scraping absent assent
X Corp. v. Bright Data	2024	May dismissal of many claims; November order denied scraping/selling-based claims	✅ Public data copying claims weakened
Compulife v. Newman/Rutstein	2024-2025	Trade-secret liability for mass extraction of insurance quote data; cert denied Feb. 2025	⚠️ Public-facing data can still be a protected database
Reddit v. Perplexity/SerpApi/Oxylabs/AWMProxy	2025-2026	Alleges industrial-scale indirect scraping through Google results	⚠️ AI-era cases target data supply chains
Google v. SerpApi	2025-2026	DMCA §1201 claims over alleged anti-bot circumvention	⚠️ Tests whether anti-bot systems are DMCA access controls

The trend line is clear: courts are increasingly protecting access to public data under the CFAA, but copyright, contract, privacy, trade secret, and infrastructure claims remain fully independent risks. And the AI training wave is creating entirely new legal questions.

Setting the Record Straight: What hiQ v. LinkedIn Actually Decided

This is the most misunderstood case in all of web scraping law. I've seen it cited in blog posts, Reddit threads, and even legal summaries as proof that "public web scraping is legal." It's not that simple.

Here's what actually happened:

What hiQ held: The Ninth Circuit affirmed a preliminary injunction — a temporary order — preventing LinkedIn from blocking hiQ's scraping of public LinkedIn profiles. The court said that accessing publicly available data likely did not violate the CFAA. Key word: likely. Source: hiQ Labs v. LinkedIn, Ninth Circuit.

What hiQ did NOT establish:

A blanket right to scrape any public website
A final ruling on the merits — the Supreme Court vacated and remanded after Van Buren, the Ninth Circuit reaffirmed, and then the case settled in late 2022 without a final court decision
The reported settlement included $500,000, an injunction, and data/software destruction obligations

Why this matters for you: hiQ is encouraging for scrapers of public data. It signals that courts are wary of platforms creating private monopolies over information they don't own. But it's not a legal guarantee. Other claims — copyright, contract, privacy, trade secrets — were never resolved. Post-Van Buren, the CFAA landscape is clearer, but relying solely on hiQ as a legal shield would be a mistake.

Getting this right is what separates informed risk management from wishful thinking.

Can I Legally Scrape This? A Practical Decision Flowchart

ig_01ef7eecb01f4f920169f06460a4f0819194734b5fbc60656e_compressed.webp

Scraping legality feels like a "grey area" — I hear that constantly. So instead of more legal theory, here's a decision framework you can actually use. Five questions, any scraping project:

1. Is the data publicly accessible (no login required)?

If NO → Higher CFAA risk. Seek permission or legal review before proceeding.
If YES → Move to question 2.

2. Are you bypassing any technical barriers (CAPTCHA, IP blocks, rate limits, paywalls)?

If YES → Potential DMCA and CFAA issues. Stop or escalate to legal counsel.
If NO → Move to question 3.

3. Did you agree to a clickwrap ToS that prohibits scraping?

If YES → Civil contract liability risk. Consider whether the data is available from another source or seek permission.
If NO → Move to question 4.

4. Does the data include personal information (PII)?

If YES → Check CCPA and applicable state privacy laws. Ensure you have a compliant use case and respect opt-out rights.
If NO → Move to question 5.

5. What will you do with the data?

Commercial republication of copyrighted content (full articles, photos, videos) → Copyright risk.
Transformative analysis, internal research, or factual data use (prices, specs, listings) → Generally lower risk.

If you land in the "public pages, no bypass, no clickwrap, non-PII, factual data for internal analysis" zone, you're in the lowest-risk category. That's exactly the kind of workflow Thunderbit is designed for — extracting structured, factual data from public web pages like product listings, business directories, and real estate data, then exporting to Excel, Google Sheets, Airtable, or Notion for your own analysis.

Bookmark this flowchart. It won't replace a lawyer, but it'll save you from a lot of unnecessary panic.

AI Training and Web Scraping: The New Legal Frontier

ig_01ef7eecb01f4f920169f063bb1014819192c3bf906b778b39_compressed.webp

AI has added an entirely new layer of complexity to scraping law. Scraping data to train large language models, image generators, and other AI systems is now a major legal battleground — and the courts haven't settled the key questions yet.

Here's where things stand:

Case	Status (2026)	Key Issue
NYT v. OpenAI/Microsoft	Ongoing. Core copyright claims allowed to proceed in April 2025; discovery disputes include 20M+ ChatGPT logs.	Does training on scraped news articles constitute fair use or copyright infringement?
Bartz v. Anthropic	Judge Alsup held certain training uses were fair use, but pirated source acquisition was not. Reported settlement: ~$1.5B.	Training may be transformative, but pirated source copying is a separate problem.
Thomson Reuters v. Ross	Delaware court rejected fair use for use of Westlaw headnotes to build a competing legal research product.	Direct substitute products face higher copyright risk.
Getty v. Stability AI	UK case largely favored Stability in 2025; U.S. case pending.	Image-training law remains unsettled.

The U.S. Copyright Office's 2025 AI report adds useful nuance: training on large, diverse datasets may often be transformative, but pirate-source copying and uses that directly compete with copyright owners' markets are much weaker fair use arguments.

For most business users reading this article, the distinction is straightforward: scraping data for your own analysis or business operations (lead gen, price monitoring, market research) is a very different legal animal than scraping data to train and commercialize an AI model. The former carries lower copyright risk. The latter is where the big lawsuits are happening.

How to Scrape Data Responsibly (Best Practices for Business Teams)

Enough law. Here's how to actually scrape data without creating legal headaches for your team.

Stick to Publicly Available Data

Focus on data that anyone can see without logging in — product listings, business directories, public records, pricing pages. The moment you're behind a login, you've moved into a higher-risk zone.

Don't Bypass Technical Barriers

If a site uses CAPTCHAs, IP blocks, rate limits, or paywalls, those are signals. Circumventing them can trigger DMCA, CFAA, or contract claims. If the data is important enough, look for an official API or data partnership instead.

Check the Terms of Service

Especially if you've created an account or clicked "I agree." Read the ToS for anti-scraping clauses. If the terms prohibit scraping and you've agreed to them, consider whether the data is available from another source.

Minimize Personal Data Collection

If you're collecting PII (names, emails, phone numbers), make sure you have a compliant use case under applicable state privacy laws. Scraping factual business data — company names, product prices, listing details — is substantially lower risk than scraping individual consumer profiles.

Respect Robots.txt and Rate Limits

Robots.txt (RFC 9309) isn't legally binding on its own, but respecting it demonstrates good faith. And don't hammer a website's servers — throttle your requests, use reasonable intervals, and don't cause infrastructure harm.

Use Data for Analysis, Not Republication

Transformative use — analysis, aggregation, internal research, competitive intelligence — is far safer than copying and reposting someone else's articles, images, or reviews. If you're building dashboards or spreadsheets for your team, you're in a better position than if you're republishing scraped content on your own website.

Choose Tools Designed for Compliant Scraping

This is where I'll mention what we've built at Thunderbit. Our AI web scraper Chrome extension is designed for business users who want to extract structured data from public web pages — product listings, business directories, real estate data, lead information — without needing to write code or bypass technical barriers. The AI reads the page, suggests fields, and lets you export to Excel, Google Sheets, Airtable, or Notion. It's built for the lowest-risk branch of the decision flowchart above: public pages, factual data, no login bypass.

That said, no tool makes you immune from legal risk. The responsibility for what you scrape and how you use it always rests with you.

Keep Logs and Stop on Cease-and-Desist

Document your scraping activity and business purpose. If you receive a cease-and-desist letter, stop and consult legal counsel. Continuing to scrape after formal notice raises your risk profile significantly, especially if gated systems are involved.

Key Takeaways on Web Scraping Legality in the US

The short version:

No US federal law bans web scraping. Scraping publicly available factual data is generally permitted.
Legality depends on what you scrape, how you access it, and what you do with it. Public pages + factual data + internal analysis = lowest risk.
The CFAA's scope has narrowed after Van Buren and hiQ, but copyright, contract, privacy, and trade secret claims are independent risks that still apply.
Criminal liability is rare for typical business scraping. Most risks are civil — lawsuits, not handcuffs.
hiQ v. LinkedIn is not a blanket permission slip. It was a preliminary injunction that later settled. Encouraging, but not a guarantee.
State privacy laws matter when PII is involved, but non-PII data (prices, listings, specs) carries the lowest risk.
AI training use cases are a new and unsettled legal frontier. Business scraping for your own analysis is a different risk profile than scraping to build commercial AI models.
Following best practices — public data, respect ToS, avoid PII, don't bypass barriers, use data responsibly — keeps your team in the safe zone.

A necessary disclaimer: this article is informational, not legal advice. If you're planning a large-scale scraping operation or dealing with sensitive data, consult a qualified attorney. But for the sales manager who just wants to pull leads from a public directory, or the ecommerce team monitoring competitor prices? The law is more on your side than you probably think.

If you want to see how Thunderbit makes this kind of public-data extraction simple — no code, no bypass, just structured data into your workflow — check out our quick start guide or grab the Chrome extension and try it yourself.

FAQs

1. Is web scraping legal in the US in 2026?

Yes, web scraping is generally legal in the US when you scrape publicly available data. There is no federal law that bans it. However, how you scrape, what data you collect, and how you use it can create legal risk under the CFAA, copyright law, contract law, or state privacy regulations. The safest approach is to stick to public pages, avoid bypassing technical barriers, minimize personal data collection, and use the data for analysis rather than direct republication.

2. Can I go to jail for web scraping?

Criminal prosecution for web scraping is extremely rare and would typically require accessing data behind authentication barriers without authorization (a CFAA violation) or committing fraud. The DOJ's 2022 CFAA charging policy states that ordinary terms-of-service violations are not sufficient for criminal charges. Most web scraping disputes are civil matters — lawsuits, not criminal cases.

3. Does violating a website's Terms of Service make scraping illegal?

Not automatically. Violating a website's ToS is a contract issue, not a criminal offense. If you've agreed to clickwrap terms that prohibit scraping, the website could pursue a civil breach-of-contract claim. But browsewrap terms (linked in a footer) are much harder to enforce, especially if you never logged in or clicked "I agree." Courts have been skeptical of passive browsewrap enforcement in multiple scraping cases.

4. Is it legal to scrape personal data (emails, phone numbers) in the US?

It depends. Many US state privacy laws — including CCPA, VCDPA, CPA, and others — include exceptions for publicly available personal information, but definitions and downstream-use obligations vary. Scraping non-personal data (product prices, business listings, public records) carries much lower risk than scraping individual consumer profiles. If you're collecting PII at scale, check the applicable state laws and ensure you have a compliant purpose.

5. Did hiQ v. LinkedIn make all web scraping legal?

No. The hiQ ruling was a preliminary injunction — a temporary order based on the likelihood of success — not a final decision on the merits. The Ninth Circuit said accessing public data likely did not violate the CFAA, but the case settled in 2022 without a final court ruling. It does not grant blanket permission to scrape any website, and it does not address copyright, contract, privacy, or trade secret claims. It's encouraging for public-data scrapers, but it's not a legal guarantee.

Learn More