Is Web Scraping Legal in the US? What the Law Actually Says

Last Updated on April 29, 2026

A few weeks ago, a colleague on our sales team asked me a question I hear constantly: "Can we scrape leads from this public business directory, or will we get sued?" He'd found a goldmine of prospect data sitting right there on the open web — no login, no paywall — but a quick Google search had him convinced he might end up in handcuffs.

That kind of anxiety is everywhere. Automated traffic now accounts for roughly , the web scraping software market is projected to grow from about , and yet most of the legal guidance floating around online is either outdated, oversimplified, or flat-out wrong. The hiQ v. LinkedIn case from 2022? Nearly every article treats it like a Supreme Court ruling that "all scraping is legal." (Spoiler: it isn't, and it wasn't.)

Meanwhile, major new cases in 2024 and 2025 — involving X (formerly Twitter), Meta, Reddit, Google, and AI companies — are actively reshaping the rules, and almost nobody is covering them. This guide covers what U.S. law actually says about web scraping in 2026, separates the myths from reality, and gives you a practical framework for figuring out what you can and can't do.

ig_01ef7eecb01f4f920169f063829a4481918da7ee0e1b3f672e_compressed.webp

What Is Web Scraping (And Why Do Businesses Care)?

Web scraping is using automated software to collect information from websites and organize it into structured data — think spreadsheets, databases, or CRM records.

More precisely, a scraper visits web pages, reads the underlying HTML, and pulls out specific data points — prices, names, addresses, product specs, whatever you need — into neat rows and columns. It's the digital equivalent of hiring someone to copy information from a website into Excel, except a bot does it in seconds instead of hours.

Web scraping is NOT hacking. It accesses the same information any visitor would see in their browser.

And it's not some niche developer trick. Search engines, price comparison sites, real estate platforms, market research dashboards, and AI-powered tools all rely on web crawling and scraping to function. If you've ever used Google, checked a flight aggregator, or browsed Zillow, you've benefited from scraping.

The most common business use cases I run into:

  • Lead generation: Extracting company names, websites, job titles, or public contact details from business directories.
  • Competitor price monitoring: Ecommerce teams tracking rival SKU prices, availability, and shipping info.
  • Real estate intelligence: Aggregating public property listings, prices, and market trends.
  • Product research: Pulling product specs, ratings, availability, and category data from retail sites.
  • Market intelligence: Tracking job postings, store openings, news signals, or public financial data.

The technique itself is neutral. The legal analysis turns on how you access the data and what you do with it afterward.

There is no U.S. federal law that bans web scraping outright. Scraping publicly available data is generally permitted.

But — and this is a big but — legality depends on several factors: the type of data, how you access it, whether you agreed to any terms of service, whether the data includes personal information, and what you plan to do with it.

The biggest source of confusion in forums, Reddit threads, and even legal blogs? People conflate "illegal" with "against a website's terms of service." These are very different things. Breaking a website's rules might get your IP blocked or your account banned. Breaking a federal law could mean a lawsuit or, in rare cases, criminal prosecution. Most scraping consequences fall squarely in the civil category.

The rest of this article unpacks the key laws, the landmark court cases (including ones from 2024 and 2025 that almost nobody covers), and a practical decision framework you can actually use.

The Three Types of "Illegal": Criminal, Civil, and ToS Violations

Time to clear up the single biggest misconception about web scraping law. When someone asks "is web scraping illegal?", they're usually lumping together three completely different categories of risk. Separating them changes the whole conversation.

ig_01ef7eecb01f4f920169f064039ff881918c7bf5b1db31fa7f_compressed.webp

Type of LiabilityWhat Triggers ItPotential ConsequenceSeverity
Criminal (CFAA)Accessing data behind authentication barriers without authorization, fraud, credential misuseFederal prosecution, fines, imprisonment🔴 Severe — but extremely rare for ordinary business scraping
Civil lawsuitCopyright infringement, trespass to chattels, breach of contract, trade secret misappropriation, privacy violationsMonetary damages, injunctions, data deletion🟡 Significant
ToS violationBreaching browsewrap or clickwrap terms of serviceAccount termination, IP blocking, cease-and-desist, possible civil suit🟢 Low to moderate

The Department of Justice's explicitly states that ordinary terms-of-service violations — like creating a fake account or violating website rules — are not by themselves sufficient for federal criminal charges. That's a big deal.

The practical takeaway: if you're a sales team scraping public business listings or an ecommerce team monitoring competitor prices, you're almost certainly looking at civil risk management, not criminal exposure. That doesn't mean you can ignore the rules, but it should recalibrate your anxiety level.

The Key US Laws That Apply to Web Scraping

Four legal pillars intersect with web scraping in the U.S., and each one addresses a different piece of the puzzle.

The Computer Fraud and Abuse Act (CFAA)

The was originally written to prosecute computer hacking. Over the years, it became the go-to statute for scraping lawsuits, usually under the theory that a scraper accessed a website "without authorization."

Then came . The Supreme Court held that a person "exceeds authorized access" under the CFAA only when they access areas of a computer — files, folders, databases — that are off-limits to them. Simply misusing information you're otherwise allowed to see doesn't count.

Scraping implications:

  • Lower CFAA risk: Public web pages available to anyone without a login. No gate, no "unauthorized access" problem.
  • Higher CFAA risk: Data behind logins, paywalls, access tokens, session manipulation, or revoked access.

The hiQ v. LinkedIn case (which we'll dissect in detail below) reinforced this for public data. But the CFAA is only one piece of the puzzle.

U.S. copyright law protects original creative expression — articles, photos, videos, creative product descriptions — but . The Supreme Court's is the landmark case here: facts like names, addresses, and phone numbers are not copyrightable, no matter how much effort went into compiling them.

Risk tiers for scraped data:

What You're ScrapingCopyright RiskWhy
Prices, product names, addresses, dates, specsLowerThese are facts
Full articles, photos, videos, creative reviewsHigherThese are expressive works
Curated databases, rankings, editorial taxonomiesMedium-highSelection and arrangement may be protected
Paywalled or DRM-protected contentHighCopyright plus access-control issues

The adds another layer: bypassing technical protection measures (paywalls, DRM, certain anti-bot systems) to access copyrighted content can trigger liability even if you never copy the content itself. This is being tested aggressively in 2025-2026 cases, including , where Google alleges DMCA violations for circumventing its SearchGuard anti-bot system.

Fair use matters too — transformative use (analyzing, aggregating, or building on data rather than just republishing it) is generally safer than copying and reposting someone else's content.

Contract Law: Terms of Service (Browsewrap vs. Clickwrap)

Many websites include anti-scraping language in their terms of service — but enforceability depends entirely on how you encountered those terms.

Contract TypeEnforceabilityWhat It Means for Scrapers
Clickwrap (you click "I agree")StrongCourts consistently enforce these. Anti-scraping terms can support civil claims.
Sign-in wrap (notice near login)Fact-specificDepends on how conspicuous the notice was.
Browsewrap (linked in footer)WeakerCourts are skeptical when users had no real notice.
Account/API termsStrongerLogged-in scraping or API misuse is much higher risk.

In , the court found that Meta's terms didn't cover logged-out public scraping in the way Meta argued — Bright Data hadn't been shown to use logged-in accounts for the public scraping at issue. That's a meaningful distinction.

Practical advice: if you never logged in, never clicked "I agree," and are scraping only public pages, browsewrap restrictions are harder for a website to enforce against you. But always check the ToS before scraping, especially if you've created an account.

US State Privacy Laws (CCPA and Beyond)

If the data you're scraping includes personal information — names, emails, phone numbers, location data — state privacy laws may apply. And the patchwork is growing fast. The IAPP counted , and .

Most of these laws include exceptions for "publicly available" personal information, but the definitions vary. And downstream use — selling, sharing, or profiling with that data — can still trigger obligations even if the initial collection is exempt.

State LawEffectiveCovers Scraped PII?Opt-Out RequirementPenalty Range
CCPA/CPRA (California)2020/2023YesSale/sharing opt-out; GPC recognized$2,663–$7,988/violation (2025 adjusted)
CPA (Colorado)2023YesUniversal opt-out/GPC from July 2024Civil penalties under deceptive trade practice framework
CTDPA (Connecticut)2023YesOOPS/GPC from Jan. 2025Up to $5,000/willful violation
VCDPA (Virginia)2023YesOpt-out rightUp to $7,500/violation
TDPSA (Texas)2024YesUniversal opt-out from Jan. 2025Up to $7,500/violation
+ 8 more enacted through 2026VariesVariesVariesVaries

Additional states with enacted laws include Utah, Oregon, Montana, Delaware, Iowa, Nebraska, New Hampshire, New Jersey, Tennessee, Minnesota, Maryland, Indiana, Kentucky, and Rhode Island. Alabama enacted a law effective May 1, 2027.

For business users scraping product prices, business listings, or market data — non-PII, factual information — privacy risk is substantially lower. Tools like focus on structured extraction from public pages (product data, business directories, real estate listings), which aligns with the lowest-risk scraping category.

Landmark Web Scraping Cases: A Timeline from 2000 to 2026

This is where I think most guides on this topic fall short. Nearly every article stops at hiQ v. LinkedIn (2022) and ignores the rulings that are actively shaping scraping law right now. Here's the full timeline:

CaseYearKey HoldingImpact on Scrapers
eBay v. Bidder's Edge2000Preliminary injunction under trespass to chattels; crawler burden on servers mattered⚠️ High-volume scraping that burdens servers can create civil liability
Facebook v. Power Ventures2016CFAA liability after cease-and-desist and continued access using Facebook systems⚠️ C&D plus authenticated/gated access is high risk
Van Buren v. US2021CFAA "exceeds authorized access" requires accessing off-limits computer areas✅ Narrowed CFAA scope significantly
hiQ v. LinkedIn2022Accessing public data not a CFAA violation (preliminary injunction, later settled)✅ Public data ≠ "unauthorized access" — but not a final ruling
Meta v. Bright Data2024Bright Data won summary judgment on Meta's contract theory for logged-out public scraping✅ Terms may not bind logged-out scraping absent assent
X Corp. v. Bright Data2024May dismissal of many claims; November order denied scraping/selling-based claims✅ Public data copying claims weakened
Compulife v. Newman/Rutstein2024-2025Trade-secret liability for mass extraction of insurance quote data; cert denied Feb. 2025⚠️ Public-facing data can still be a protected database
Reddit v. Perplexity/SerpApi/Oxylabs/AWMProxy2025-2026Alleges industrial-scale indirect scraping through Google results⚠️ AI-era cases target data supply chains
Google v. SerpApi2025-2026DMCA §1201 claims over alleged anti-bot circumvention⚠️ Tests whether anti-bot systems are DMCA access controls

The trend line is clear: courts are increasingly protecting access to public data under the CFAA, but copyright, contract, privacy, trade secret, and infrastructure claims remain fully independent risks. And the AI training wave is creating entirely new legal questions.

Setting the Record Straight: What hiQ v. LinkedIn Actually Decided

This is the most misunderstood case in all of web scraping law. I've seen it cited in blog posts, Reddit threads, and even legal summaries as proof that "public web scraping is legal." It's not that simple.

Here's what actually happened:

What hiQ held: The Ninth Circuit affirmed a preliminary injunction — a temporary order — preventing LinkedIn from blocking hiQ's scraping of public LinkedIn profiles. The court said that accessing publicly available data likely did not violate the CFAA. Key word: likely. Source: .

What hiQ did NOT establish:

  • A blanket right to scrape any public website
  • A final ruling on the merits — the Supreme Court vacated and remanded after Van Buren, the Ninth Circuit reaffirmed, and then the case without a final court decision
  • The reported settlement included $500,000, an injunction, and data/software destruction obligations

Why this matters for you: hiQ is encouraging for scrapers of public data. It signals that courts are wary of platforms creating private monopolies over information they don't own. But it's not a legal guarantee. Other claims — copyright, contract, privacy, trade secrets — were never resolved. Post-Van Buren, the CFAA landscape is clearer, but relying solely on hiQ as a legal shield would be a mistake.

Getting this right is what separates informed risk management from wishful thinking.

Can I Legally Scrape This? A Practical Decision Flowchart

ig_01ef7eecb01f4f920169f06460a4f0819194734b5fbc60656e_compressed.webp

Scraping legality feels like a "grey area" — I hear that constantly. So instead of more legal theory, here's a decision framework you can actually use. Five questions, any scraping project:

1. Is the data publicly accessible (no login required)?

  • If NO → Higher CFAA risk. Seek permission or legal review before proceeding.
  • If YES → Move to question 2.

2. Are you bypassing any technical barriers (CAPTCHA, IP blocks, rate limits, paywalls)?

  • If YES → Potential DMCA and CFAA issues. Stop or escalate to legal counsel.
  • If NO → Move to question 3.

3. Did you agree to a clickwrap ToS that prohibits scraping?

  • If YES → Civil contract liability risk. Consider whether the data is available from another source or seek permission.
  • If NO → Move to question 4.

4. Does the data include personal information (PII)?

  • If YES → Check CCPA and applicable state privacy laws. Ensure you have a compliant use case and respect opt-out rights.
  • If NO → Move to question 5.

5. What will you do with the data?

  • Commercial republication of copyrighted content (full articles, photos, videos) → Copyright risk.
  • Transformative analysis, internal research, or factual data use (prices, specs, listings) → Generally lower risk.

If you land in the "public pages, no bypass, no clickwrap, non-PII, factual data for internal analysis" zone, you're in the lowest-risk category. That's exactly the kind of workflow is designed for — extracting structured, factual data from public web pages like product listings, business directories, and real estate data, then exporting to Excel, Google Sheets, Airtable, or Notion for your own analysis.

Bookmark this flowchart. It won't replace a lawyer, but it'll save you from a lot of unnecessary panic.

ig_01ef7eecb01f4f920169f063bb1014819192c3bf906b778b39_compressed.webp

AI has added an entirely new layer of complexity to scraping law. Scraping data to train large language models, image generators, and other AI systems is now a major legal battleground — and the courts haven't settled the key questions yet.

Here's where things stand:

CaseStatus (2026)Key Issue
NYT v. OpenAI/MicrosoftOngoing. Core copyright claims allowed to proceed in April 2025; discovery disputes include 20M+ ChatGPT logs.Does training on scraped news articles constitute fair use or copyright infringement?
Bartz v. AnthropicJudge Alsup held certain training uses were fair use, but pirated source acquisition was not. Reported settlement: ~$1.5B.Training may be transformative, but pirated source copying is a separate problem.
Thomson Reuters v. RossDelaware court rejected fair use for use of Westlaw headnotes to build a competing legal research product.Direct substitute products face higher copyright risk.
Getty v. Stability AIUK case largely favored Stability in 2025; U.S. case pending.Image-training law remains unsettled.

The adds useful nuance: training on large, diverse datasets may often be transformative, but pirate-source copying and uses that directly compete with copyright owners' markets are much weaker fair use arguments.

For most business users reading this article, the distinction is straightforward: scraping data for your own analysis or business operations (lead gen, price monitoring, market research) is a very different legal animal than scraping data to train and commercialize an AI model. The former carries lower copyright risk. The latter is where the big lawsuits are happening.

How to Scrape Data Responsibly (Best Practices for Business Teams)

Enough law. Here's how to actually scrape data without creating legal headaches for your team.

Stick to Publicly Available Data

Focus on data that anyone can see without logging in — product listings, business directories, public records, pricing pages. The moment you're behind a login, you've moved into a higher-risk zone.

Don't Bypass Technical Barriers

If a site uses CAPTCHAs, IP blocks, rate limits, or paywalls, those are signals. Circumventing them can trigger DMCA, CFAA, or contract claims. If the data is important enough, look for an official API or data partnership instead.

Check the Terms of Service

Especially if you've created an account or clicked "I agree." Read the ToS for anti-scraping clauses. If the terms prohibit scraping and you've agreed to them, consider whether the data is available from another source.

Minimize Personal Data Collection

If you're collecting PII (names, emails, phone numbers), make sure you have a compliant use case under applicable state privacy laws. Scraping factual business data — company names, product prices, listing details — is substantially lower risk than scraping individual consumer profiles.

Respect Robots.txt and Rate Limits

isn't legally binding on its own, but respecting it demonstrates good faith. And don't hammer a website's servers — throttle your requests, use reasonable intervals, and don't cause infrastructure harm.

Use Data for Analysis, Not Republication

Transformative use — analysis, aggregation, internal research, competitive intelligence — is far safer than copying and reposting someone else's articles, images, or reviews. If you're building dashboards or spreadsheets for your team, you're in a better position than if you're republishing scraped content on your own website.

Choose Tools Designed for Compliant Scraping

This is where I'll mention what we've built at . Our is designed for business users who want to extract structured data from public web pages — product listings, business directories, real estate data, lead information — without needing to write code or bypass technical barriers. The AI reads the page, suggests fields, and lets you export to . It's built for the lowest-risk branch of the decision flowchart above: public pages, factual data, no login bypass.

That said, no tool makes you immune from legal risk. The responsibility for what you scrape and how you use it always rests with you.

Keep Logs and Stop on Cease-and-Desist

Document your scraping activity and business purpose. If you receive a cease-and-desist letter, stop and consult legal counsel. Continuing to scrape after formal notice raises your risk profile significantly, especially if gated systems are involved.

Key Takeaways on Web Scraping Legality in the US

The short version:

  • No US federal law bans web scraping. Scraping publicly available factual data is generally permitted.
  • Legality depends on what you scrape, how you access it, and what you do with it. Public pages + factual data + internal analysis = lowest risk.
  • The CFAA's scope has narrowed after Van Buren and hiQ, but copyright, contract, privacy, and trade secret claims are independent risks that still apply.
  • Criminal liability is rare for typical business scraping. Most risks are civil — lawsuits, not handcuffs.
  • hiQ v. LinkedIn is not a blanket permission slip. It was a preliminary injunction that later settled. Encouraging, but not a guarantee.
  • State privacy laws matter when PII is involved, but non-PII data (prices, listings, specs) carries the lowest risk.
  • AI training use cases are a new and unsettled legal frontier. Business scraping for your own analysis is a different risk profile than scraping to build commercial AI models.
  • Following best practices — public data, respect ToS, avoid PII, don't bypass barriers, use data responsibly — keeps your team in the safe zone.

A necessary disclaimer: this article is informational, not legal advice. If you're planning a large-scale scraping operation or dealing with sensitive data, consult a qualified attorney. But for the sales manager who just wants to pull leads from a public directory, or the ecommerce team monitoring competitor prices? The law is more on your side than you probably think.

If you want to see how Thunderbit makes this kind of public-data extraction simple — no code, no bypass, just structured data into your workflow — check out our or grab the and try it yourself.

FAQs

1. Is web scraping legal in the US in 2026?

Yes, web scraping is generally legal in the US when you scrape publicly available data. There is no federal law that bans it. However, how you scrape, what data you collect, and how you use it can create legal risk under the CFAA, copyright law, contract law, or state privacy regulations. The safest approach is to stick to public pages, avoid bypassing technical barriers, minimize personal data collection, and use the data for analysis rather than direct republication.

2. Can I go to jail for web scraping?

Criminal prosecution for web scraping is extremely rare and would typically require accessing data behind authentication barriers without authorization (a CFAA violation) or committing fraud. The DOJ's 2022 CFAA charging policy states that ordinary terms-of-service violations are not sufficient for criminal charges. Most web scraping disputes are civil matters — lawsuits, not criminal cases.

3. Does violating a website's Terms of Service make scraping illegal?

Not automatically. Violating a website's ToS is a contract issue, not a criminal offense. If you've agreed to clickwrap terms that prohibit scraping, the website could pursue a civil breach-of-contract claim. But browsewrap terms (linked in a footer) are much harder to enforce, especially if you never logged in or clicked "I agree." Courts have been skeptical of passive browsewrap enforcement in multiple scraping cases.

4. Is it legal to scrape personal data (emails, phone numbers) in the US?

It depends. Many US state privacy laws — including CCPA, VCDPA, CPA, and others — include exceptions for publicly available personal information, but definitions and downstream-use obligations vary. Scraping non-personal data (product prices, business listings, public records) carries much lower risk than scraping individual consumer profiles. If you're collecting PII at scale, check the applicable state laws and ensure you have a compliant purpose.

5. Did hiQ v. LinkedIn make all web scraping legal?

No. The hiQ ruling was a preliminary injunction — a temporary order based on the likelihood of success — not a final decision on the merits. The Ninth Circuit said accessing public data likely did not violate the CFAA, but the case settled in 2022 without a final court ruling. It does not grant blanket permission to scrape any website, and it does not address copyright, contract, privacy, or trade secret claims. It's encouraging for public-data scrapers, but it's not a legal guarantee.

Learn More

Fawad Khan
Fawad Khan
Fawad writes for a living, and honestly, he kind of loves it. He's spent years figuring out what makes a line of copy stick — and what makes readers scroll past. Ask him about marketing, and he'll talk for hours. Ask him about carbonara, and he'll talk longer.
Table of Contents

Try Thunderbit

Scrape leads & other data in just 2-clicks. Powered by AI.

Get Thunderbit It's free
Extract Data using AI
Easily transfer data to Google Sheets, Airtable, or Notion
Chrome Store Rating
PRODUCT HUNT#1 Product of the Week