Learn How to Scrape Amazon Products with Python

If you've followed an Amazon scraping tutorial only to hit a wall of CAPTCHAs, 503 errors, or completely empty results — welcome to the club. Most Python Amazon scraping guides floating around the internet were written in 2022 or 2023, and they use selectors and techniques that Amazon has long since patched.

I've spent years building data extraction tools at Thunderbit, and one thing I can tell you from the trenches: Amazon is one of the hardest sites to scrape reliably. The platform changes its HTML structure constantly, deploys a six-layer anti-bot defense, and even serves different page layouts to different users via A/B testing. In this guide, I'm going to walk you through a Python Amazon scraper that actually works in 2025 — with verified CSS selectors, a layered anti-blocking strategy, and guidance on scheduling and exporting that most tutorials skip entirely. And for folks who just need the data without wrestling with Python, I'll also show you how can do the same job in about two clicks.

What Is Amazon Product Scraping?

Amazon product scraping is the process of programmatically extracting publicly available data — product names, prices, ratings, review counts, images, availability, and more — from Amazon's product and search result pages. Instead of manually copying information from hundreds of listings, a scraper visits each page, reads the HTML, and pulls out the data you specify into a structured format like CSV, Excel, or a database.

Think of it as hiring a tireless intern who can visit a thousand product pages in the time it takes you to finish your morning coffee. Except this intern never misspells anything and doesn't need lunch breaks.

Why Scrape Amazon Products with Python?

Amazon hosts roughly across 30+ categories, powered by approximately . Third-party sellers now represent 69% of total GMV. Manually monitoring even a fraction of that catalog is impossible. Here's why teams scrape Amazon:

Use Case	Who Benefits	What They Extract
Price monitoring & repricing	Ecommerce ops, marketplace sellers	Prices, availability, seller info
Competitor analysis	Product managers, brand teams	Product features, ratings, review counts
Market research	Analysts, new product teams	Category trends, pricing distributions
Lead generation	Sales teams	Seller names, brand info, contact data
Affiliate marketing	Content creators, deal sites	Prices, deals, product details
Inventory tracking	Supply chain, procurement	Stock status, delivery estimates

The scale of Amazon's pricing alone makes automation essential: Amazon changes prices , with the average product's price updating roughly every 10 minutes. By contrast, competitors like Best Buy and Walmart change prices only about 50,000 times per month. No human team can keep up.

Python gives you full control over the scraping process — you decide what to extract, how to handle errors, and where to store the data. But it also means you're responsible for maintenance, anti-blocking, and keeping up with Amazon's frequent HTML changes.

What You Can Scrape from Amazon (and What You Can't)

From publicly accessible product pages, you can typically extract:

Product title (name, brand)
Price (current, original, deal price)
Rating (star average)
Review count
Product images (main image URL)
Availability / stock status
ASIN (Amazon Standard Identification Number)
Product description and bullet points
Seller information
Product variations (size, color, etc.)

What you should avoid:

Data behind login walls: Extended review pages, personal account data, order history
Personal information: Buyer names, addresses, payment info
Copyrighted content for republishing: Product descriptions and images are fine for analysis, but don't republish them as your own

Amazon's blocks 50+ named bots (including GPTBot, Scrapy, and ClaudeBot) and disallows paths like user accounts, carts, and wishlists. Product detail pages are not explicitly disallowed, but Amazon's Terms of Service do prohibit automated access. Courts have generally distinguished between ToS violations (a civil matter) and criminal violations under the CFAA — more on legality at the end of this guide.

Tools and Libraries You'll Need

Here's the Python stack for this tutorial:

Library	Purpose	Why We Use It
`requests`	HTTP requests	Simple, widely supported
`beautifulsoup4`	HTML parsing	Easy CSS selector-based extraction
`lxml`	Fast HTML parser	Used as BeautifulSoup's parser backend
`curl_cffi`	TLS fingerprint impersonation	Critical for bypassing Amazon's detection
`pandas`	Data structuring & export	DataFrames, CSV/Excel export

Optional (for JavaScript-rendered content):

selenium or playwright — headless browser automation

Setting Up Your Python Environment

Open your terminal and run:

1mkdir amazon-scraper && cd amazon-scraper
2python -m venv venv
3source venv/bin/activate  # On Windows: venv\Scripts\activate
4pip install requests beautifulsoup4 lxml curl_cffi pandas

Verify everything installed:

1import requests, bs4, curl_cffi, pandas
2print("All good!")

If you see "All good!" with no errors, you're ready.

Why Most Amazon Scraping Tutorials Break (and How This One Is Different)

This is the part most guides skip, and it's the reason you're probably reading this article in the first place.

Amazon frequently updates its HTML structure, class names, and element IDs. The scraping community reports that due to DOM shifts and fingerprinting changes. The most infamous casualty? The selector #priceblock_ourprice, which appeared in hundreds of tutorials from 2018–2023. That ID no longer exists on Amazon product pages.

A quick comparison of what's broken vs. what's works now:

Data Point	Broken Selector (Pre-2024)	Working 2025 Selector
Price	`#priceblock_ourprice`	`div#corePriceDisplay_desktop_feature_div span.a-price .a-offscreen`
Title	`#productTitle`	`span#productTitle` (still works)
Rating	`span.a-icon-alt` (sometimes wrong context)	`#acrPopover span.a-icon-alt`
Review Count	`#acrCustomerReviewCount`	`span#acrCustomerReviewText`
Availability	`#availability span`	`div#availability span.a-size-medium`

Every code snippet in this guide was tested against live Amazon pages in 2025. I'll show you the actual CSS selectors alongside expected output — no copy-pasting from 2022.

Before You Start

Difficulty: Intermediate (basic Python knowledge assumed)
Time Required: ~30–45 minutes for the full tutorial; ~10 minutes for the basic scraper
What You'll Need: Python 3.9+, Chrome browser (for inspecting Amazon pages), a terminal, and optionally the if you want to compare the no-code approach

Step 1: Send Your First Request to Amazon

Navigate to any Amazon product page in your browser and copy the URL. We'll start with a simple requests.get():

1import requests
2url = "https://www.amazon.com/dp/B0DGNFM9YJ"
3response = requests.get(url)
4print(response.status_code)
5print(response.text[:500])

Run this, and you'll almost certainly get a 503 status code or a page that says "To discuss automated access to Amazon data please contact…" That's Amazon's WAF (Web Application Firewall) detecting your Python script. A bare requests.get() without proper headers achieves roughly a against Amazon.

You should see something like: 503 and a block page in the HTML. That's expected — we'll fix it in the next step.

Step 2: Set Up Custom Headers and TLS Impersonation

Simply adding a User-Agent header isn't enough anymore. Amazon compares your HTTP headers against your TLS fingerprint. If you claim to be Chrome 120 but your TLS handshake reveals Python's requests library, you're .

The most reliable approach in 2025 is to use curl_cffi with browser impersonation:

1from curl_cffi import requests as cfreq
2headers = {
3    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
4    "Accept-Language": "en-US,en;q=0.9",
5    "Accept-Encoding": "gzip, deflate, br",
6    "Referer": "https://www.google.com/",
7    "DNT": "1",
8    "Connection": "keep-alive",
9    "Upgrade-Insecure-Requests": "1",
10}
11url = "https://www.amazon.com/dp/B0DGNFM9YJ"
12response = cfreq.get(url, headers=headers, impersonate="chrome124")
13print(response.status_code)
14print(len(response.text))

With curl_cffi impersonating Chrome 124, success rates jump to approximately — a 47x improvement over plain requests. You should now see a 200 status code and a much longer HTML response (100,000+ characters).

If you still get a 503, try a different impersonate value (e.g., "chrome131") or add a short delay before retrying.

Step 3: Parse the HTML and Extract Product Data

Now that we have the full HTML, let's extract the data using BeautifulSoup with verified 2025 selectors:

1from bs4 import BeautifulSoup
2soup = BeautifulSoup(response.text, "lxml")
3# Product Title
4title_el = soup.select_one("span#productTitle")
5title = title_el.get_text(strip=True) if title_el else None
6# Price
7price_el = soup.select_one(
8    "div#corePriceDisplay_desktop_feature_div span.a-price .a-offscreen"
9)
10if not price_el:
11    price_el = soup.select_one("span.priceToPay .a-offscreen")
12if not price_el:
13    price_el = soup.select_one(".apexPriceToPay .a-offscreen")
14price = price_el.get_text(strip=True) if price_el else None
15# Rating
16rating_el = soup.select_one("#acrPopover span.a-icon-alt")
17rating = rating_el.get_text(strip=True) if rating_el else None
18# Review Count
19reviews_el = soup.select_one("span#acrCustomerReviewText")
20reviews = reviews_el.get_text(strip=True) if reviews_el else None
21# Availability
22avail_el = soup.select_one("div#availability span")
23availability = avail_el.get_text(strip=True) if avail_el else None
24# Main Image URL
25img_el = soup.select_one("#landingImage")
26image_url = img_el.get("src") if img_el else None
27print(f"Title: \{title\}")
28print(f"Price: \{price\}")
29print(f"Rating: \{rating\}")
30print(f"Reviews: \{reviews\}")
31print(f"Availability: \{availability\}")
32print(f"Image: \{image_url\}")

Expected output (example):

1Title: Apple AirPods Pro (2nd Generation) with USB-C
2Price: $189.99
3Rating: 4.7 out of 5 stars
4Reviews: 98,432 ratings
5Availability: In Stock
6Image: https://m.media-amazon.com/images/I/61SUj2...

Notice the multiple fallback selectors for price — Amazon uses different containers depending on the product type, deal status, and A/B test variant. Wrapping each extraction in a conditional check prevents your scraper from crashing when a selector doesn't match.

Step 4: Scrape Multiple Products from Search Results

To build a real dataset, you'll want to start from an Amazon search results page, collect ASINs, then scrape each product detail page.

1import time
2import random
3def get_search_asins(keyword, max_pages=1):
4    """Collect ASINs from Amazon search results."""
5    asins = []
6    for page in range(1, max_pages + 1):
7        search_url = f"https://www.amazon.com/s?k=\{keyword\}&page=\{page\}"
8        resp = cfreq.get(search_url, headers=headers, impersonate="chrome124")
9        if resp.status_code != 200:
10            print(f"Search page \{page\} returned \{resp.status_code\}")
11            break
12        search_soup = BeautifulSoup(resp.text, "lxml")
13        results = search_soup.select('div[data-component-type="s-search-result"]')
14        for r in results:
15            asin = r.get("data-asin")
16            if asin:
17                asins.append(asin)
18        print(f"Page \{page\}: found {len(results)} products")
19        time.sleep(random.uniform(2, 5))  # Polite delay
20    return asins
21asins = get_search_asins("wireless+earbuds", max_pages=2)
22print(f"Collected {len(asins)} ASINs")

Each ASIN maps to a clean product URL: https://www.amazon.com/dp/\{ASIN\}. This is more reliable than using the full search result URLs, which can contain session-specific parameters.

Step 5: Handle Pagination and Scrape at Scale

Now let's combine search collection and detail page scraping into a full pipeline:

1import pandas as pd
2def scrape_product(asin):
3    """Scrape a single Amazon product detail page."""
4    url = f"https://www.amazon.com/dp/\{asin\}"
5    try:
6        resp = cfreq.get(url, headers=headers, impersonate="chrome124")
7        if resp.status_code != 200:
8            return None
9        soup = BeautifulSoup(resp.text, "lxml")
10        title_el = soup.select_one("span#productTitle")
11        price_el = (
12            soup.select_one("div#corePriceDisplay_desktop_feature_div span.a-price .a-offscreen")
13            or soup.select_one("span.priceToPay .a-offscreen")
14            or soup.select_one(".apexPriceToPay .a-offscreen")
15        )
16        rating_el = soup.select_one("#acrPopover span.a-icon-alt")
17        reviews_el = soup.select_one("span#acrCustomerReviewText")
18        avail_el = soup.select_one("div#availability span")
19        img_el = soup.select_one("#landingImage")
20        return {
21            "asin": asin,
22            "title": title_el.get_text(strip=True) if title_el else None,
23            "price": price_el.get_text(strip=True) if price_el else None,
24            "rating": rating_el.get_text(strip=True) if rating_el else None,
25            "reviews": reviews_el.get_text(strip=True) if reviews_el else None,
26            "availability": avail_el.get_text(strip=True) if avail_el else None,
27            "image_url": img_el.get("src") if img_el else None,
28            "url": url,
29        }
30    except Exception as e:
31        print(f"Error scraping \{asin\}: \{e\}")
32        return None
33# Scrape all collected ASINs
34products = []
35for i, asin in enumerate(asins):
36    print(f"Scraping {i+1}/{len(asins)}: \{asin\}")
37    product = scrape_product(asin)
38    if product:
39        products.append(product)
40    time.sleep(random.uniform(2, 5))  # Random delay between requests
41df = pd.DataFrame(products)
42print(f"\nScraped {len(df)} products successfully")
43print(df.head())

The random delay between 2–5 seconds is critical. Perfectly regular timing (e.g., exactly 3 seconds every time) looks suspicious to Amazon's behavioral analysis. Random intervals mimic human browsing patterns.

Step 6: Save Scraped Amazon Data to CSV

1df.to_csv("amazon_products.csv", index=False, encoding="utf-8-sig")
2print("Saved to amazon_products.csv")

You should now have a clean CSV with columns for ASIN, title, price, rating, reviews, availability, image URL, and product URL. This is where most tutorials stop — but if you're building a real workflow, CSV is just the beginning.

Anti-Blocking Deep Dive: How to Keep Your Scraper Running

Getting blocked is the for anyone who tries to scrape Amazon products with Python. Amazon's six-layer defense includes IP reputation analysis, TLS fingerprinting, browser environment checks, behavioral biometrics, CAPTCHAs, and ML-driven anomaly detection. Below is a layered strategy to address each one.

Rotate User-Agents and Full Headers

A single static User-Agent gets flagged fast. Rotate through a list of current browser strings:

1import random
2USER_AGENTS = [
3    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36",
4    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36",
5    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
6    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.0 Safari/605.1.15",
7]
8def get_headers():
9    return {
10        "User-Agent": random.choice(USER_AGENTS),
11        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
12        "Accept-Language": "en-US,en;q=0.9",
13        "Accept-Encoding": "gzip, deflate, br",
14        "Referer": "https://www.google.com/",
15        "DNT": "1",
16        "Connection": "keep-alive",
17    }

One detail that trips people up: your Accept-Language must match the geographic location implied by your IP. Sending Accept-Language: en-US from a German IP is a red flag.

TLS Fingerprint Impersonation with curl_cffi

We covered this in Step 2, but it's worth emphasizing: this single technique provides the biggest improvement in success rate. Standard Python requests achieves about 2% success against Amazon. With curl_cffi impersonation, you're at roughly 94%. That's the difference between a working scraper and a broken one.

1from curl_cffi import requests as cfreq
2# Rotate impersonation targets too
3BROWSERS = ["chrome120", "chrome124", "chrome131"]
4response = cfreq.get(
5    url,
6    headers=get_headers(),
7    impersonate=random.choice(BROWSERS),
8)

Proxy Rotation

For scraping more than a handful of pages, you'll need proxy rotation. Amazon tracks IP addresses and will block any single IP that sends too many requests.

1PROXIES = [
2    "http://user:pass@proxy1.example.com:8080",
3    "http://user:pass@proxy2.example.com:8080",
4    "http://user:pass@proxy3.example.com:8080",
5]
6proxy = random.choice(PROXIES)
7response = cfreq.get(
8    url,
9    headers=get_headers(),
10    impersonate="chrome124",
11    proxies={"http": proxy, "https": proxy},
12)

Residential proxies are more effective than datacenter proxies (Amazon blocks datacenter IP ranges proactively), but they're also more expensive. For a small project, you might start with a and scale up as needed.

Rate Limiting and Exponential Backoff

No competitor article I found covers this, but it's essential. When you get a 503 or CAPTCHA response, don't just retry immediately — that's a fast path to a permanent ban.

1import time
2import random
3def fetch_with_backoff(url, max_retries=3):
4    """Fetch a URL with exponential backoff on failure."""
5    for attempt in range(max_retries):
6        response = cfreq.get(
7            url,
8            headers=get_headers(),
9            impersonate=random.choice(BROWSERS),
10        )
11        if response.status_code == 200:
12            return response
13        # Exponential backoff with jitter
14        wait = min(2 ** attempt + random.uniform(0, 1), 30)
15        print(f"Attempt {attempt+1} failed (\{response.status_code\}). Waiting {wait:.1f}s...")
16        time.sleep(wait)
17    return None  # All retries exhausted

The formula wait = min(2^attempt + jitter, max_delay) ensures your delays grow (2s, 4s, 8s...) but never exceed a reasonable cap. The random jitter prevents your retry pattern from being fingerprinted.

Selenium or Playwright Fallback for JS-Rendered Content

Some Amazon pages (especially those with dynamic pricing widgets or variation selectors) require JavaScript to render fully. When curl_cffi returns incomplete HTML, a headless browser is your fallback:

1from playwright.sync_api import sync_playwright
2def scrape_with_browser(url):
3    with sync_playwright() as p:
4        browser = p.chromium.launch(headless=True)
5        page = browser.new_page()
6        page.goto(url, wait_until="domcontentloaded")
7        page.wait_for_timeout(3000)  # Let JS render
8        html = page.content()
9        browser.close()
10        return html

This is slower — 3–5 seconds per page vs. under 1 second with curl_cffi. Use it only when needed.

In my experience, curl_cffi handles 90%+ of Amazon product pages without a browser.

Anti-Blocking Summary

Technique	Difficulty	Effectiveness	Covered by Most Tutorials?
Custom User-Agent	Easy	Low (Amazon detects patterns)	Yes
Full header rotation	Easy	Medium	Rarely
TLS impersonation (curl_cffi)	Medium	High (~94% success)	Almost never
Proxy rotation	Medium	High	Briefly, if at all
Rate limiting + exponential backoff	Easy	Medium	No
Selenium/Playwright fallback	Medium	High (for JS content)	Mentioned, not demonstrated

Beyond CSV: Export Scraped Amazon Data to Google Sheets, Airtable, and More

Every tutorial I reviewed stops at CSV export. But real business workflows need data in Google Sheets, databases, or tools like Airtable and Notion.

Export to Google Sheets with gspread

First, set up a Google service account (one-time setup):

Go to → APIs & Services → Credentials
Create a service account and download the JSON key file
Save it to ~/.config/gspread/service_account.json
Share your target spreadsheet with the client_email from the JSON file

Then:

1import gspread
2from gspread_dataframe import set_with_dataframe
3gc = gspread.service_account()
4sh = gc.open("Amazon Scrape Data")
5worksheet = sh.sheet1
6set_with_dataframe(worksheet, df)
7print("Data exported to Google Sheets!")

This writes your entire DataFrame directly to a Google Sheet — live, shareable, and ready for dashboards.

Store in SQLite for Local Analysis

For larger datasets or historical tracking, SQLite is perfect — no server setup, just a single file:

1import sqlite3
2conn = sqlite3.connect("amazon_products.db")
3df.to_sql("products", conn, if_exists="append", index=False)
4print(f"Stored {len(df)} products in SQLite")
5# Query later:
6historical = pd.read_sql_query(
7    "SELECT * FROM products WHERE price IS NOT NULL ORDER BY rowid DESC LIMIT 100",
8    conn,
9)

The No-Code Alternative

If you don't want to maintain Python export scripts, offers free export to Google Sheets, Airtable, Notion, Excel, CSV, and JSON — including image fields that render directly in Airtable and Notion. No gspread setup, no API credentials, no code at all. For teams that need data flowing into their existing tools, it's a significant time saver.

Scheduling Automated Amazon Scrapes — The Missing Chapter

Price monitoring and inventory tracking require recurring scrapes, not one-off runs. Yet I couldn't find a single competitor article that covers scheduling. Here's how to automate your Python scraper.

Cron Jobs (Linux/macOS)

Open your crontab:

1crontab -e

Add a line to run your scraper daily at 6 AM:

10 6 * * * cd /path/to/amazon-scraper && /path/to/venv/bin/python scraper.py >> ~/scraper.log 2>&1

Or every 6 hours:

10 */6 * * * cd /path/to/amazon-scraper && /path/to/venv/bin/python scraper.py >> ~/scraper.log 2>&1

Windows Task Scheduler

Create a batch file run_scraper.bat:

1@echo off
2cd /d "C:\path\to\amazon-scraper"
3call venv\Scripts\activate
4python scraper.py
5deactivate

Then open Task Scheduler → Create Basic Task → set your trigger (Daily, Hourly) → Action: "Start a program" → browse to run_scraper.bat.

GitHub Actions (Free Tier)

For a cloud-based schedule with zero infrastructure:

1name: Amazon Scraper
2on:
3  schedule:
4    - cron: "0 6 * * *"  # Daily at 6 AM UTC
5  workflow_dispatch:       # Manual trigger
6jobs:
7  scrape:
8    runs-on: ubuntu-latest
9    steps:
10      - uses: actions/checkout@v3
11      - name: Set up Python
12        uses: actions/setup-python@v4
13        with:
14          python-version: "3.11"
15      - name: Install dependencies
16        run: pip install -r requirements.txt
17      - name: Run scraper
18        run: python scraper.py
19      - name: Commit results
20        run: |
21          git config user.name 'GitHub Actions'
22          git config user.email 'actions@github.com'
23          git add data/
24          git diff --staged --quiet || git commit -m "Update scraped data"
25          git push

Store proxy credentials in GitHub Secrets, and you've got a free, automated scraping pipeline.

No-Code Alternative: Thunderbit's Scheduled Scraper

For teams that don't want to manage cron syntax or cloud infrastructure, Thunderbit offers a built-in . You describe the schedule in plain English (e.g., "every day at 8 AM" or "every Monday"), add your Amazon URLs, and click "Schedule." No terminal, no YAML files, no deployment pipeline. It's particularly useful for ecommerce teams running continuous price or inventory monitoring.

Python DIY vs. Scraper API vs. No-Code: Which Approach Should You Use?

This is a question I see on forums constantly, and no top-ranking article provides a structured answer. So here's my honest take:

Criteria	Python + BS4/curl_cffi	Scraper API (ScraperAPI, Oxylabs)	No-Code (Thunderbit)
Setup time	30–60 min	10–20 min	~2 minutes
Coding required	Yes (Python)	Yes (API calls)	None
Anti-blocking built-in	No (DIY)	Yes	Yes
Handles JS rendering	Only with Selenium/Playwright	Varies by provider	Yes (Browser or Cloud mode)
Scheduling	DIY (cron/cloud)	Some offer it	Built-in
Cost	Free (+ proxy costs)	$30–100+/mo	Free tier available
Maintenance	High (selectors break)	Low	None (AI adapts)
Best for	Developers wanting full control	Scale & reliability at volume	Speed, non-developers, business users

Python is the right choice if you want to learn, customize every detail, and don't mind ongoing maintenance. Scraper APIs handle anti-blocking for you but still require code. And Thunderbit is the fastest path for sales, ecommerce ops, or anyone who just needs the data — no selectors, no code, no maintenance when Amazon changes its HTML.

How Thunderbit Scrapes Amazon Products in 2 Clicks

I'm biased, of course — my team built this. But the workflow genuinely is this simple:

Install the
Navigate to an Amazon search results or product page
Click "AI Suggest Fields" (or use the instant Amazon scraper template)
Click "Scrape"

Thunderbit's AI reads the page, identifies the data structure, and extracts everything into a clean table. You can export to Excel, Google Sheets, Airtable, or Notion for free. The real payoff: when Amazon changes its HTML next week (and it will), Thunderbit's AI adapts automatically. No broken scripts, no selector updates.

For enriching product lists with detail-page data, Thunderbit's Subpage Scraping feature automatically follows links to product pages and pulls in additional fields like images, descriptions, and variations — something that takes significant extra code in Python.

Tips to Keep Your Python Amazon Scraper Working Long-Term

If you're going the Python route, here's how to minimize maintenance headaches:

Check selectors regularly. Amazon changes them often. Bookmark this article — I'll update the selector table as things change.
Monitor your success rate. Track the ratio of 200 responses vs. 503s/CAPTCHAs. Set up an alert (even a simple email) when your success rate drops below 80%.
Store raw HTML. Save the full HTML response alongside your parsed data. If selectors change, you can re-parse historical data without re-scraping.
Rotate proxies and User-Agents frequently. Static fingerprints get flagged within hours at scale.
Use exponential backoff. Never retry immediately after a block.
Containerize with Docker. Wrap your scraper in a Docker container for easy deployment and portability.
Add data validation. Check that prices are numeric, ratings are between 1–5, and titles aren't empty. One team reported a after adding validation layers.

Or, if all of that sounds like more work than you signed up for, consider whether a no-code tool like Thunderbit might be a better fit for your use case. There's no shame in choosing the faster path — I've spent enough years debugging scrapers to know that sometimes the best code is the code you don't have to write.

Legal and Ethical Considerations When Scraping Amazon

Since this comes up in every conversation about Amazon scraping, a quick note on the legal landscape:

Publicly available data scraping is generally legal in the US. The landmark ruling (2022) established that accessing public data doesn't violate the CFAA. More recently, (2024) and (2024) reinforced this principle.
Amazon's ToS prohibit automated access. This is a civil matter (breach of contract), not a criminal one. Courts have generally distinguished between the two.
Amazon v. Perplexity (2025) is an active case involving AI scraping of Amazon pages. A preliminary injunction was issued in March 2026. This is worth watching.
Stick to public pages. Don't scrape login-protected content, personal data, or anything behind authentication.
Respect rate limits. Don't hammer Amazon's servers. A delay of 2–5 seconds between requests is reasonable.
Use data responsibly. Scrape for analysis, not for republishing copyrighted content.
Consult legal counsel for large-scale commercial use, especially if you're in the EU (GDPR applies to personal data).

For a deeper dive, see our guide on .

Wrapping Up

You now have a working Python Amazon scraper with verified 2025 selectors, a layered anti-blocking strategy that goes far beyond "add a User-Agent," practical scheduling options for continuous monitoring, and export methods that get your data into Google Sheets, databases, or any tool your team uses.

Quick summary:

Python + curl_cffi + BeautifulSoup gives you full control and a ~94% success rate when combined with TLS impersonation
Anti-blocking requires multiple layers: header rotation, TLS impersonation, proxy rotation, rate limiting, and exponential backoff
Scheduling turns a one-off script into a continuous monitoring pipeline (cron, GitHub Actions, or Thunderbit's built-in scheduler)
Export beyond CSV — Google Sheets, SQLite, Airtable, Notion — is where real business value lives
Thunderbit offers a 2-click alternative for non-developers or anyone who'd rather spend their time analyzing data instead of debugging selectors

If you want to try the code, everything in this guide is ready to copy and run. And if you'd rather skip the coding entirely, lets you test the no-code approach on Amazon right away.

For more, see our guides on , , and . You can also watch step-by-step walkthroughs on the .

Happy scraping — and may your selectors survive until the next Amazon update.

FAQs

1. Why does my Python Amazon scraper get blocked after a few requests?

Amazon uses a six-layer defense system: IP reputation analysis, TLS fingerprinting (JA3/JA4), browser environment detection, behavioral biometrics, CAPTCHA challenges, and ML-driven anomaly detection. A basic requests script with just a User-Agent header achieves only about success. You need TLS impersonation (curl_cffi), full header rotation, proxy rotation, and rate limiting with random jitter to maintain reliable access.

2. What Python libraries are best for scraping Amazon products in 2025?

curl_cffi for TLS-impersonated HTTP requests (the biggest single improvement), BeautifulSoup4 with lxml for HTML parsing, pandas for data structuring and export, and Selenium or Playwright as a fallback for JavaScript-rendered content. Python is used by of scraping developers.

3. Is it legal to scrape Amazon product data?

Scraping publicly available data is generally legal in the US, supported by rulings like hiQ v. LinkedIn and Meta v. Bright Data. Amazon's Terms of Service prohibit automated access, but courts distinguish between ToS violations (civil) and criminal violations. Always avoid login-protected content, respect rate limits, and consult legal counsel for large-scale commercial use.

4. Can I scrape Amazon without writing any code?

Yes. Tools like let you scrape Amazon products in 2 clicks with a Chrome extension. Its AI-powered field detection automatically structures the data, and you can export to Excel, Google Sheets, Airtable, or Notion for free. When Amazon changes its HTML, Thunderbit's AI adapts without any manual updates.

5. How often does Amazon change its HTML selectors, and how do I keep my scraper updated?

Frequently and without notice. The scraping community reports that of crawlers need weekly fixes due to DOM changes. To stay ahead, monitor your scraper's success rate, store raw HTML for re-parsing, and check selectors against live pages regularly. Alternatively, AI-powered tools like Thunderbit adapt automatically, eliminating this maintenance burden.

Learn More

Learn How to Scrape Amazon Products with Python

Try Thunderbit