Is Web Scraping Legal in Europe? How to Scrape and Stay Safe

On May 1, 2024, the Dutch Data Protection Authority dropped a headline that rattled every data team in Europe: “scraping is almost always illegal.” If you work in sales, ecommerce, or real estate — basically anyone who relies on web data — that phrase probably made your stomach drop.

I get it. At Thunderbit, we talk to business teams every day who need web data for price monitoring, lead generation, and market research. The frustration is always the same: they Google "is web scraping legal in Europe," and every answer is some variation of "it depends." That's not helpful when you have a project deadline and a list of URLs to scrape.

So I spent weeks digging into the actual regulations, DPA guidance, enforcement records, and case law to build something more useful: a practical decision checklist, a consolidated safeguards table, real fine amounts, and a step-by-step guide to scraping European websites without ending up on the wrong side of a regulator. Whether you're scraping Amazon product prices or pulling B2B contacts from a directory, this article will help you figure out where the lines are — and how to stay on the right side of them.

What Is Web Scraping (and Why Should European Businesses Care)?

Web scraping is the automated extraction of data from websites into a structured format — a spreadsheet, a database, a CRM. Instead of copy-pasting product names and prices from 200 pages, a scraper visits each page and pulls the fields you need into neat columns.

Why does this matter for non-technical teams? Because web data powers real business decisions. Sales teams scrape directories for leads. Ecommerce managers monitor competitor prices daily. Real estate analysts track listing trends across portals. Market researchers collect public reviews and ratings at scale. The global web scraping market is growing fast, and companies scrape millions of data points every day.

But Europe's regulatory environment is different from the US. The GDPR, the Database Directive, and evolving DPA guidance mean that "publicly available" does not equal "free to use." As Dutch DPA chairman Aleid Wolfsen put it: "public does not automatically mean permission for scraping." Understanding the rules before you start isn't optional — it's the difference between a clean dataset and a six-figure fine.

Try Thunderbit for compliant web scraping

Is Web Scraping Legal in Europe? The Short Answer

Web scraping is not inherently illegal in Europe. But its legality depends on three things: what data you scrape, how you scrape it, and why.

Three overlapping legal layers govern scraping in the EU:

GDPR — applies whenever you scrape personal data (names, emails, phone numbers, IP addresses, even pseudonymized identifiers).
The EU Database Directive — protects databases where the creator made a "substantial investment" in organizing data.
Contract/Terms of Service law — many websites explicitly prohibit scraping in their ToS, and EU courts have enforced those terms.

The critical point: "public" does not mean "unregulated." Even non-personal data can be protected under database rights or contract law. Every scraping project requires looking at all three layers together.

The Key EU Laws That Govern Web Scraping

GDPR: When You Scrape Personal Data

Any data tied to an identifiable person triggers GDPR obligations. That includes names, email addresses, phone numbers, IP addresses, photos, and even pseudonymized data that can be re-identified. The moment you scrape personal data, you become a "data controller" with duties under the GDPR:

Lawful basis (Article 6): You need a legal reason to process the data. Consent is almost never practical for scraping at scale — you can't ask millions of people for permission before collecting their publicly posted info. The most commonly cited basis is legitimate interest (Article 6(1)(f)), but it requires a documented three-part test: (1) your interest is legitimate, (2) the processing is necessary, and (3) it doesn't disproportionately affect the data subjects' rights, considering their reasonable expectations.
Transparency (Article 14): Since you're not collecting data directly from the person, you must inform them — typically within one month — about what you collected, why, and how they can exercise their rights. If individual notice is disproportionate, you must publish a general notice with all the Article 14 content.
Data minimization: Only collect what you actually need. If you want product prices, don't also grab seller email addresses.
Storage limits and rights management: Set retention periods, honor erasure requests, and provide access to source information.

The EDPB ChatGPT Task Force report (adopted May 2024) added another layer: it said different processing stages — collection, preprocessing, training, prompts, and output — each need their own legal basis analysis. The EDPB didn't reject legitimate interest for web scraping, but it insisted on the full three-part assessment with appropriate safeguards.

The EU Database Directive: Protecting How Data Is Organized

The Database Directive gives a sui generis right to database creators who made a "substantial investment" in obtaining, verifying, or presenting their data. If your scraping extracts a "substantial part" of such a database, you may infringe that right.

In practice, the bar is relatively high. Scraping a few hundred product prices from a large retailer is unlikely to qualify. But bulk-downloading an entire competitor's catalog — tens of thousands of listings — could cross the line, especially if it threatens the creator's ability to recoup their investment. The Court of Justice of the EU has ruled on this threshold in several cases, and the key question is always proportionality.

For most business scraping — pulling specific fields from product pages, comparing listings across a category — the Database Directive is lower risk. But it's not zero risk, and it's worth keeping in mind when you're designing your scraping scope.

Terms of Service: The Contract Law Wildcard

This one trips people up. Many websites prohibit scraping in their Terms of Service. In Europe, violating ToS is a civil matter (not criminal), but it can still lead to injunctions, contract lawsuits, and real financial exposure.

Two flavors to know: browsewrap (passive terms, often a link buried at the bottom of the page) is harder to enforce because the user never actively agreed. Clickwrap (where you check a box or click "I agree") is much more enforceable.

The landmark EU case is Ryanair v. PR Aviation: the court enforced Ryanair's ToS against a scraper even though database rights didn't apply, because the scraper had agreed to the terms. So: always review a site's ToS before scraping. If it's a clickwrap agreement that explicitly prohibits scraping, proceed with caution — or look for API access instead.

The DSM Directive and AI Act: Exceptions for Research and Text/Data Mining

Not all scraping triggers the same restrictions. The Digital Single Market (DSM) Directive (2019) introduced two text and data mining (TDM) exceptions:

Article 3: Research institutions and cultural heritage organizations can conduct TDM on lawfully accessed content.
Article 4: Anyone — including commercial entities — can conduct TDM unless the rights holder has expressly opted out (e.g., via robots.txt, ai.txt, or TDMRep headers).

The EU AI Act (Article 53) adds obligations for AI model providers: they must comply with TDM opt-out mechanisms and document their training data sources.

One catch: these exceptions cover copyright and database rights, not GDPR. If your TDM involves personal data, you still need a separate GDPR legal basis.

The "Can I Scrape This?" Decision Checklist for European Data

This is the section I wish existed when I first started researching this topic. Every legal article says "it depends" — but what does the decision tree actually look like? Here's a step-by-step compliance checklist with clear gates. Each step leads to ✅ proceed, ⚠️ add safeguards, or 🛑 stop.

Step 1: Is the Data Personal or Non-Personal?

Non-personal data (product prices, SKU numbers, business addresses not linked to individuals): lower regulatory burden. You still need to check Database Directive and ToS, but GDPR doesn't apply. ✅ Proceed to Step 3.

Personal data (names, emails, phone numbers, photos, any identifier linked to a person): GDPR applies. ⚠️ Continue to Step 2.

Step 2: Which GDPR Legal Basis Applies?

Consent: Almost never feasible for scraping at scale. 🛑 Unless you have a very narrow, specific scenario.
Legitimate interest (Article 6(1)(f)): The most common basis. But it requires a documented three-part test:
1. Your interest is legitimate (commercial interest can qualify, per the CJEU's 2024 ruling in C-621/22).
2. The processing is necessary for that interest.
3. The balancing test: your interest doesn't override the data subjects' rights, considering their reasonable expectations.
Document your balancing test before scraping. If you can't articulate why the people whose data you're scraping would reasonably expect this use, that's a red flag. ⚠️ Proceed with documented legitimate interest.

Step 3: Does the Site's ToS Restrict Scraping?

Clickwrap agreement that prohibits scraping: 🛑 High risk. Consider alternative data sources or official API access.
Browsewrap or no ToS restriction: ⚠️ Lower risk, but still respect robots.txt and technical opposition signals.

Step 4: Does the Database Directive Apply?

Is the target a database with substantial investment in data organization?
Would your scraping extract a "substantial part" of that database?
If yes to both: ⚠️ Risk of sui generis infringement. Limit your extraction scope.

Step 5: Are You Covered by a Research or TDM Exception?

Registered research institution or cultural heritage organization? DSM Directive Article 3 may apply. ✅
Commercial TDM? Check for Article 4 opt-out signals (robots.txt, ai.txt, TDMRep). If the site has opted out, 🛑 stop for that source.

Step 6: Have You Applied DPA-Recommended Safeguards?

If you've passed the gates above, the final step is implementing the safeguards that CNIL, the Dutch DPA, and the EDPB recommend. These are covered in detail in the next section. ✅ Proceed with safeguards in place.

DPA Compliance Safeguards: What CNIL, Dutch DPA, and EDPB Recommend

No single competitor article I found consolidates the safeguards from Europe's three most active regulators on scraping. So I built this table by cross-referencing the CNIL web scraping focus sheet, the Dutch AP guidance, and the EDPB ChatGPT Task Force report.

Safeguard	CNIL	Dutch DPA (AP)	EDPB Task Force	Implementation Tips
Art. 14 transparency notice	✅ Required	✅ Required	✅ Required	Publish a public notice listing source categories, purposes, legal basis, retention, rights channels, and DPO contact
DPIA before scraping	✅ Recommended (mandatory if high-risk)	✅ Required	✅ Required	Document balancing test, data categories, risks, and mitigation measures before launch
Data minimization	✅ Required (define precise collection criteria)	✅ Required	✅ Required	Configure scraper to extract only needed fields; delete irrelevant data immediately
Rate limiting / robots.txt respect	✅ Required (exclude sites that object via robots.txt/CAPTCHA)	—	—	Parse robots.txt, add delays between requests, identify your user agent
Pseudonymization / anonymization	⚠️ Recommended (immediately after collection)	✅ Strongly urged	✅ Recommended	Hash or randomize IDs; remove profile URLs; blur faces where identity isn't needed
Retention period	✅ Defined limit	✅ Short as possible	✅ Defined limit	Automate deletion schedules; separate raw cache from extracted facts
Opt-out / blacklist mechanism	✅ Recommended (discretionary prior objection)	✅ Required (Art. 21 objection)	✅ Required	Provide opt-out form, domain blacklist, person-level suppression
Exclude sensitive sources	✅ Required (health forums, minors' sites, pornographic sites, genealogy)	✅ Required	✅ Required	Maintain default blocklists for health, religion, politics, biometrics, minors

A practical note from our side: Thunderbit's “AI Suggest Fields” feature lets users define exactly which columns to extract — price, SKU, product name — so the scraper only collects what's necessary. You're not bulk-downloading entire pages; you're selecting structured fields that align with purpose limitation and data minimization principles. That said, no tool makes non-compliant scraping legal. The legal analysis always comes first.

Is Web Scraping Legal in Europe for Your Use Case? Industry-Specific Guidance

The question I see most often in forums isn't "is scraping legal?" — it's "is my scraping legal?" Abstract GDPR theory doesn't answer that. So here's a breakdown by common business use case.

Use Case	Data Type	Key Legal Risks	Likely Outcome
Ecommerce price monitoring (public product listings)	Non-personal (prices, SKUs, product names)	Database Directive sui generis; ToS violation	Generally lower risk if no personal data and no systematic extraction of a "substantial part" of the database
B2B lead generation (contact info from directories)	Personal (names, emails, phone numbers)	GDPR Art. 6 legal basis; Art. 14 notification; ePrivacy for electronic contact	Higher risk — requires documented legitimate interest balancing test plus notification obligation
Real estate listings (property data from portals)	Mixed (addresses may be non-personal; owner names are personal)	Database Directive; ToS; GDPR if owner-linked	Medium risk — anonymize owner data, check ToS, respect robots.txt
AI training data (large-scale web content scraping)	Potentially personal if not filtered	GDPR + EU AI Act Art. 53 TDM obligations	High risk — must comply with both GDPR and AI Act; opt-out mechanisms and robust filtering required

For lower-risk scenarios like public ecommerce data, tools with structured templates — like Thunderbit's instant templates for Amazon and Shopify — reduce exposure because they extract specific, non-personal data fields without collecting extraneous content. For higher-risk scenarios involving personal data (lead generation, for instance), the legal analysis must come first. No scraper, no matter how smart, turns non-compliant collection into compliant collection.

EU vs. US vs. UK: How Web Scraping Laws Compare

If your business operates across borders, you need to understand how the rules differ. I couldn't find a single competitor article that presents this as a scannable side-by-side table, so here it is.

Dimension	EU	US	UK (post-Brexit)
Primary law	GDPR + Database Directive + ePrivacy	CFAA + state laws (limited federal data privacy)	UK GDPR + Data Protection Act 2018
Public data scraping	Still requires GDPR legal basis if personal	Generally legal per hiQ v. LinkedIn (public data)	Similar to EU; ICO guidance applies
ToS enforcement	Civil matter; Ryanair v. PR Aviation enforced sui generis	Van Buren narrowed CFAA; ToS breach ≠ criminal	Civil matter, similar to EU
Database protection	Sui generis right (strong)	No equivalent federal right	Retained sui generis right
AI/TDM exception	DSM Directive Art. 3–4; AI Act Art. 53	No federal TDM exception (fair use doctrine)	UK exploring TDM exception (stalled as of 2026)
Key enforcement body	National DPAs (CNIL, Dutch AP, etc.)	FTC + state AGs	ICO
Recent trend	Stricter (Dutch AP: "almost always illegal" for personal data)	More permissive post-hiQ	Moderate; generally following EU direction

If you're scraping European websites or data about European residents, EU rules apply — even if your company is based in the US or UK.

Real Fines and Cases: What Actually Happens If You Get Caught (2022–2026)

This is the section that answers the question behind the question: "What's the real risk?" I compiled every public DPA enforcement action involving web scraping or scraped personal data from 2022 through April 2026.

Year	Enforcer	Target	Violation	Fine/Outcome
2022	Italian Garante	Clearview AI	Scraping facial images without legal basis	€20M fine + ban + erasure order
2022	Hellenic DPA (Greece)	Clearview AI	Same — facial recognition scraping	€20M fine + ban + deletion
2022	CNIL (France)	Clearview AI	Facial recognition database	€20M fine + €100K/day possible penalty
2023	CNIL (France)	Clearview AI	Non-compliance with 2022 order	€5.2M penalty payment
2023	Austrian DSB	Clearview AI	30B+ facial images from public web	Erasure + EU representative order (no published fine)
2024	Dutch AP	Clearview AI	Illegal facial recognition data collection	€30.5M fine + compliance orders
2024	CNIL (France)	KASPR	LinkedIn contact-data scraping for lead gen	€240,000 fine — 160M contacts, restricted-visibility data, 5-year retention
2024	Irish DPC	X / Grok	Public posts used for AI training	Suspension agreement; statutory inquiry opened in 2025
2024	Irish DPC	Meta	Planned LLM training on public Facebook/Instagram content	Meta paused EU AI training plans
2024	Italian Garante	OpenAI	ChatGPT training data + transparency	€15M fine issued, annulled by Rome court in March 2026

The total EU/EEA monetary penalties in the scraping/open-web category: over €95 million (excluding the annulled OpenAI fine).

Every one of these major fines targeted mass scraping of biometric or personal data without any legal basis. Clearview scraped billions of facial images. KASPR scraped 160 million contacts, including data from restricted-visibility LinkedIn profiles, and kept it for five years.

Proportionate, targeted scraping of public non-personal data — like product prices or SKU numbers — has not been the subject of enforcement actions. That doesn't make it risk-free, but it helps put the numbers in perspective.

How to Scrape European Websites Safely: A Step-by-Step Guide

Difficulty: Beginner
Time Required: ~15 minutes (including compliance review)
What You'll Need: Chrome browser, Thunderbit extension (free tier works), a target URL, and a quick review of the checklist above

Step 1: Define Your Purpose and Data Needs

Before opening any tool, write down why you need the data and exactly which fields you need. This isn't just good practice — it's the foundation of GDPR's purpose limitation and data minimization principles.

For example: "I need product names, prices, and stock status from 50 Amazon product pages to update our competitive pricing spreadsheet." That's specific. Compare it to: "I want to scrape everything from Amazon." The first passes the minimization test; the second doesn't.

Step 2: Run the Compliance Checklist

Go through the six-step "Can I Scrape This?" checklist above. If any gate returns 🛑, stop and consult legal counsel before proceeding.

Running our Amazon pricing example through the gates: the data is non-personal (prices, SKUs, product names) ✅, no GDPR personal data issue ✅, Amazon's ToS should be reviewed (they do restrict scraping, so consider official product data APIs where available) ⚠️, and Database Directive risk is low for 50 products ✅.

Step 3: Choose the Right Scraping Approach

Method	Ease of Use	Compliance Support	Maintenance	Accuracy
Manual copy-paste	Low	N/A (you control what you copy)	High (time-consuming)	Error-prone
Code-based scraper (Python, Scrapy)	Low (requires coding)	None built-in	High (breaks when sites change)	High if maintained
Thunderbit (AI-powered)	Very high	Built-in field-level minimization	Low (AI adapts to page changes)	High
Official API	Medium	Highest (structured, sanctioned access)	Low	Highest

For business users without a dev team, Thunderbit is the fastest path. For sites with official APIs (like Amazon's Product Advertising API), the API is always the safest route — but it often has limitations on data volume and fields.

Step 4: Configure Your Scraper for Compliance

In Thunderbit:

Navigate to your target page (e.g., an Amazon product listing page).
Click the Thunderbit icon in your Chrome toolbar and select "AI Suggest Fields." The AI scans the page and suggests columns like "Product Name," "Price," "Rating," and "Stock Status."
Remove any fields you don't need. If the AI suggests "Seller Name" or "Seller Email" and you only need pricing data, delete those columns. This is data minimization in practice.
Use the Field AI Prompt to add instructions like "exclude personal identifiers" or "extract only public pricing data."
Choose Cloud Scraping for public ecommerce sites (faster, no login needed) or Browser Scraping for sites that require authentication.
Before clicking "Scrape," verify that robots.txt doesn't prohibit scraping for your use case. You can check by visiting [domain]/robots.txt in your browser.

You should now see a table preview with only the fields you've configured — no extraneous personal data, no unnecessary metadata.

Step 5: Export, Store, and Manage Data Responsibly

After scraping, export your data to Excel, Google Sheets, Airtable, or Notion — Thunderbit supports all of these with free export.

Then:

Set a retention period. Don't store scraped data indefinitely. If you're doing weekly price monitoring, last month's raw data probably isn't needed.
If personal data was collected (e.g., for lead generation), document your legal basis, publish an Article 14 transparency notice, and set up a process for handling opt-out and erasure requests.
Automate deletion schedules where possible. Thunderbit's Scheduled Scraper can automate recurring scrapes at set intervals while maintaining the same field-level configuration, so each run stays within your compliance parameters.

Tips for Staying Compliant While Scraping in Europe

Some practices I've picked up from researching this topic and talking to compliance-minded teams:

Always review ToS before scraping a new site. It takes two minutes and can save you months of legal headaches.
Use APIs when available. They're structured, sanctioned, and the safest route. Scraping should be the fallback, not the default.
Conduct a DPIA for any project involving personal data at scale. CNIL says AI training datasets can create high risk, and the DPIA is your accountability proof. Even for smaller projects, documenting your analysis is smart.
Keep a scraping log. Record what was scraped, when, from where, your legal basis, and your retention period. If a DPA ever asks, you'll be glad you have it.
Monitor regulatory updates. DPA guidance is evolving fast — CNIL published new AI scraping sheets in January 2026, and the EDPB is expected to issue further opinions. The rules today may tighten tomorrow.
Don't scrape from restricted or sensitive sources. CNIL's mandatory exclusion list includes health forums, sites mainly used by minors, pornographic sites, genealogy sites, and highly structured personal-data sites. If you're building a scraping project, maintain a default blocklist.
Automated traffic is a big deal operationally. Akamai reported that bots made up 42% of overall web traffic in 2024, and Thales/Imperva found automated bot traffic surpassed human traffic for the first time, reaching 51% in 2024. Regulators increasingly treat bot behavior, rate, and evasion as evidence of risk and unfairness. Behaving like a responsible scraper — identifying your user agent, rate-limiting, respecting opposition signals — isn't just polite; it's legally relevant.

Conclusion

Web scraping is not illegal in Europe. But it's regulated — especially when personal data is involved.

The legal outcome depends on what you scrape (personal vs. non-personal), how you scrape (ToS, robots.txt, rate limiting, field-level minimization), and why (documented purpose and legal basis). The enforcement record is clear: mass, indiscriminate scraping of personal data without any legal basis is where companies face seven- and eight-figure fines. Proportionate, targeted scraping of public non-personal data — with safeguards in place — sits in a very different risk category.

The practical framework:

Use the decision checklist before every scraping project.
Apply DPA-recommended safeguards (transparency, minimization, retention limits, opt-out mechanisms).
Choose tools that support compliance by design. Thunderbit's AI-powered field selection, structured extraction, and free export to Google Sheets, Excel, Airtable, and Notion make it straightforward to scrape only the data you need — no more, no less.
Document everything. Balancing test, source list, retention schedule, DPIA. If a regulator asks, your file is your defense.

Obligatory disclaimer: this article is informational, not legal advice. For high-risk scenarios involving personal data at scale, consult a qualified privacy attorney. The regulations are evolving, and the cost of getting it wrong is real.

Want to try compliant, targeted web scraping for yourself? Thunderbit's free tier lets you experiment with structured extraction on a small scale — define your fields, scrape only what you need, and export in clicks. You can also explore our YouTube channel for step-by-step walkthroughs.

Try AI Web Scraper for compliant data extraction Get Started Free

FAQs

1. Is web scraping legal in Europe if the data is publicly available?

Public availability does not exempt data from GDPR if it contains personal information. As the Dutch DPA stated, "public does not automatically mean permission for scraping." Non-personal public data (product prices, SKUs) is generally lower risk, but you still need to check the Database Directive and the site's Terms of Service.

2. Can I scrape emails and phone numbers from European websites?

Emails and phone numbers are personal data under GDPR. You need a lawful basis — typically legitimate interest with a documented balancing test — and you must notify individuals under Article 14. The CNIL fined KASPR €240,000 in 2024 for scraping LinkedIn contact data without adequate transparency or legal basis, so this is an area where enforcement is active.

3. What is the biggest fine for illegal web scraping in Europe?

The Dutch DPA fined Clearview AI €30.5 million in 2024 for illegal facial recognition data collection from the public web. Multiple other EU DPAs fined Clearview €20 million each. Total EU/EEA scraping-related fines from 2022–2026 exceed €95 million.

4. Does respecting robots.txt make web scraping legal in Europe?

Respecting robots.txt is a best practice and aligns with CNIL's mandatory safeguards, but it doesn't guarantee legality on its own. You still need to comply with GDPR (if personal data is involved), the Database Directive, and the site's Terms of Service. Think of robots.txt compliance as one layer in a multi-layer compliance framework.

5. How is web scraping law different in Europe vs. the US?

The EU is significantly stricter. GDPR applies to any personal data — even publicly available data — and the Database Directive provides strong protection for organized datasets. The US has no federal equivalent to either law; after hiQ v. LinkedIn, scraping public data is generally permissible in the US. The UK post-Brexit sits in between, with UK GDPR and retained database rights largely mirroring EU rules but with ICO enforcement. For cross-border businesses, the EU's rules set the highest bar — and if you're scraping data about EU residents, those rules apply regardless of where your company is based.

Learn More