Automate Your Market Research: Scrape Shopify with Python

Last Updated on April 16, 2026

Shopify's /products.json endpoint is one of the worst-kept secrets in ecommerce data. Append it to any Shopify store URL and you get structured JSON back—no API keys, no authentication, no scraping through nested HTML.

I work on the team, so I spend a lot of time thinking about how people extract data from the web. And Shopify scraping comes up constantly—sales teams tracking competitor prices, ecommerce ops people benchmarking product catalogs, procurement folks sourcing new vendors. With on Shopify and the platform holding roughly , the volume of scrapable product data is enormous.

This guide covers the entire process: what the endpoint returns, how to paginate through thousands of products, how to handle rate limits without getting blocked, and how to flatten Shopify's nested JSON into a clean CSV or Excel file using pandas. I'll also cover the endpoints nobody else talks about (/collections.json, /meta.json) and show a no-code alternative for people who'd rather skip Python entirely.

What Is Shopify's /products.json Endpoint (and Why It Makes Scraping Easy)

Every Shopify store has a public endpoint at {store-url}/products.json that returns structured product data. No API keys. No OAuth. No authentication of any kind. You literally append /products.json to the store URL and get back a JSON array of every product in the catalog.

Try it yourself right now: open or in your browser. You'll see clean, structured JSON with product titles, prices, variants, images, tags—everything.

Compare that to the alternative: parsing Shopify's HTML themes, which are deeply nested, inconsistent across stores, and change whenever a merchant updates their theme. Here's what you'd be dealing with:

The HTML approach (painful):

1<div class="product-card__info">
2  <h3 class="product-card__title">
3    <a href="/products/classic-blue-jeans">Classic Blue Jeans</a>
4  </h3>
5  <span class="price price--on-sale" data-product-price>$149.00</span>
6</div>

The JSON approach (clean):

1{
2  "title": "Classic Blue Jeans",
3  "handle": "classic-blue-jeans",
4  "vendor": "Hiut Denim",
5  "variants": [{"price": "149.00", "sku": "HD-BLU-32", "available": true}]
6}

JSON wins on consistency, reliability, and ease of parsing. The endpoint also supports two key query parameters—?limit= (up to 250 products per page, default is 30) and ?page= for pagination—which we'll use extensively in the code below.

Important distinction: this is a public storefront endpoint, not the . The Admin API requires store owner access tokens and provides order data, inventory levels, and customer information. The public /products.json endpoint is read-only product data that anyone can access. I'll cover the distinction in detail later, because forum confusion about this is rampant.

A caveat: not every Shopify store exposes this endpoint. In my testing, about 71% of stores return valid JSON (allbirds.com, gymshark.com, colourpop.com, kyliecosmetics.com all work), while some custom configurations return 404 (hiutdenim.co.uk, bombas.com). The quick check is simple: visit {store-url}/products.json in your browser and see what you get.

Why Scrape Shopify with Python? Top Business Use Cases

Why bother? ROI. now use automated price scraping for competitive intelligence, up from just 34% in 2020. And research shows a . The data is worth real money.

Here are the most common use cases I see:

Use CaseWho BenefitsWhat You Get
Competitor price monitoringEcommerce ops teamsTrack pricing changes, discounts, and compare-at prices across competitor catalogs
Product research & sourcingProcurement / merchandisingCompare product features, variants, materials, and availability
Lead generationSales teamsExtract vendor names, brand data, and contact info from store catalogs
Market & category analysisMarketing teamsUnderstand product mix, tags, collection structure, and positioning
Inventory & availability trackingSupply chain teamsMonitor variant-level stock status (available: true/false) over time
New product detectionProduct teamsTrack created_at timestamps to spot new launches from competitors

Python is the natural fit for this work. use Python as their primary language, and the ecosystem—requests for HTTP, pandas for data manipulation, httpx for async—makes it straightforward to go from "I have a URL" to "I have a spreadsheet" in under 80 lines of code.

Complete products.json Field Reference: Every Field Explained

Every competing tutorial shows you title, id, and handle, then moves on. Shopify's JSON response contains over 40 fields across products, variants, images, and options. Knowing what's available before you write your scraping code saves you from re-scraping later.

I pulled this reference from live /products.json responses fetched on April 16, 2026. The structure is consistent across all stores that expose the endpoint.

Product-Level Fields

FieldData TypeExample ValueBusiness Use Case
idInteger123456789Unique product identifier for deduplication
titleString"Classic Blue Jeans"Product name for catalogs and comparisons
handleString"classic-blue-jeans"URL slug—construct product page links as {store}/products/{handle}
body_htmlString (HTML) or null

Our best-selling...

Product description for content analysis and SEO research
vendorString"Hiut Denim"Brand/vendor name for lead gen or sourcing
product_typeString"Jeans"Category classification for market analysis
created_atISO DateTime"2024-01-15T10:30:00-05:00"Track when products were added (new launch detection)
updated_atISO DateTime"2025-03-01T08:00:00-05:00"Detect recent catalog changes
published_atISO DateTime"2024-01-16T00:00:00-05:00"Know when products went live on the storefront
tagsArray of Strings["organic", "women", "straight-leg"]Keyword/tag analysis for SEO, categorization, and trend spotting
variantsArray of Objects(see variant fields below)Price, SKU, availability per variant
imagesArray of Objects(see image fields below)Product image URLs for catalogs and visual analysis
optionsArray of Objects[{"name": "Size", "values": ["S","M","L"]}]Understand product configuration (size, color, material)

Variant-Level Fields (nested under each product)

FieldData TypeExampleUse Case
idInteger987654321Unique variant identifier
titleString"32 / Blue"Variant display name
skuString"HD-BLU-32"SKU matching for inventory systems
priceString"185.00"Price monitoring (note: it's a string, cast to float for math)
compare_at_priceString or null"200.00"Original price—essential for discount tracking
availableBooleantrueStock availability (the only public stock indicator)
weightFloat1.2Shipping/logistics analysis
option1, option2, option3String"32", "Blue", nullIndividual option values
created_at, updated_atISO DateTime—Variant-level change tracking

Image-Level Fields

FieldData TypeExampleUse Case
idInteger111222333Unique image identifier
srcString (URL)"https://cdn.shopify.com/..."Direct image download link
altString or null"Front view of jeans"Image alt text for accessibility analysis
positionInteger1Image ordering
width, heightInteger2048, 2048Image dimensions

What's NOT in the Public Endpoint

A critical gotcha: inventory_quantity is NOT available on public /products.json responses. This field was removed from public-facing JSON endpoints in December 2017 for security reasons. The only stock indicator you get is the boolean available field on each variant (true or false). To access actual inventory counts, you need the authenticated Admin API with store owner credentials.

Before writing your scraping code, scan this table and decide which fields matter for your use case. If you're doing price monitoring, you need variants[].price, variants[].compare_at_price, and variants[].available. If you're doing lead gen, focus on vendor, product_type, and tags. Filter accordingly—your CSV will be much cleaner.

Beyond products.json: Collections, Meta, and Other Shopify Endpoints

No competing tutorial mentions these endpoints. They're essential for serious competitive intelligence work.

/collections.json — All Store Categories

Returns every collection (category) in the store with titles, handles, descriptions, and product counts. I verified this on zoologistperfumes.com, allbirds.com, and gymshark.com—all returned valid JSON.

1{
2  "collections": [
3    {
4      "id": 308387348539,
5      "title": "Attars",
6      "handle": "attars",
7      "published_at": "2026-03-29T12:20:32-04:00",
8      "products_count": 1,
9      "image": { "src": "https://cdn.shopify.com/..." }
10    }
11  ]
12}

Want to understand how a competitor organizes their catalog? This is the endpoint.

/collections/{handle}/products.json — Products by Category

Returns products filtered by a specific collection. Same JSON structure as /products.json, but scoped to one category. This is critical for category-level scraping—say you only want to monitor a competitor's "Sale" or "New Arrivals" collection.

/meta.json — Store-Level Metadata

Returns store name, description, currency, country, and—here's the good part—published_products_count. That count lets you pre-calculate exactly how many pagination pages you'll need: ceil(published_products_count / 250). No more blindly incrementing pages until you hit an empty response.

Which Endpoint Should You Use?

What You WantEndpointAuth Needed?
All products (public)/products.jsonNo
Products in a specific category/collections/{handle}/products.jsonNo
Store metadata + product count/meta.jsonNo
All collections (categories)/collections.jsonNo
Order/sales data (own store only)Admin API /orders.jsonYes (API key)
Inventory quantities (own store only)Admin API /inventory_levels.jsonYes

The recurring forum question—"Can I scrape how many units a competitor sold?"—has a short answer: no. Not from public endpoints. Sales data and inventory quantities require the authenticated Admin API, which means you need store owner access. Public endpoints give you product catalog data only.

shopify-data-access-methods.webp

How to Scrape Shopify with Python: Step-by-Step Setup

  • Difficulty: Beginner
  • Time Required: ~15 minutes (setup + first scrape)
  • What You'll Need: Python 3.11+, pip, a terminal, and a Shopify store URL to scrape

Step 1: Install Python and Required Libraries

Make sure you have Python 3.11 or newer installed (pandas 3.0.x requires it). Then install the two libraries we need:

1pip install requests pandas

For Excel export, you'll also want:

1pip install openpyxl

At the top of your script, add these imports:

1import requests
2import pandas as pd
3import time
4import random
5import json

You should see no import errors when you run the script. If pandas throws a version error, upgrade Python to 3.12.

Step 2: Fetch Product Data from /products.json

Here's a basic function that takes a store URL, hits the endpoint, and returns parsed JSON:

1def fetch_products_page(store_url, page=1, limit=250):
2    """Fetch a single page of products from a Shopify store."""
3    url = f"{store_url.rstrip('/')}/products.json"
4    params = {"limit": limit, "page": page}
5    headers = {
6        "User-Agent": "Mozilla/5.0 (compatible; ProductResearch/1.0)"
7    }
8    response = requests.get(url, params=params, headers=headers, timeout=30)
9    response.raise_for_status()
10    return response.json().get("products", [])

Key details:

  • limit=250 is the maximum Shopify allows per page. The default is 30, so setting this explicitly reduces your total requests by up to 8x.
  • User-Agent header: Always set a realistic one. Requests without a User-Agent are more likely to trigger Shopify's anti-bot systems.
  • timeout=30: Don't let a single request hang forever.

Test it with a known store:

1products = fetch_products_page("https://allbirds.com")
2print(f"Fetched {len(products)} products")
3print(f"First product: {products[0]['title']}")

You should see something like: Fetched 250 products and the first product title.

Step 3: Handle Pagination to Scrape All Products

A single request returns at most 250 products. Most stores have more than that (Allbirds has 1,420+). You need to loop through pages until you get an empty response.

1def scrape_all_products(store_url, delay=1.0):
2    """Scrape all products from a Shopify store, handling pagination."""
3    all_products = []
4    page = 1
5    while True:
6        print(f"Fetching page {page}...")
7        products = fetch_products_page(store_url, page=page, limit=250)
8        if not products:
9            print(f"No more products. Total: {len(all_products)}")
10            break
11        all_products.extend(products)
12        print(f"  Got {len(products)} products (total so far: {len(all_products)})")
13        page += 1
14        # Be polite: wait between requests
15        time.sleep(delay + random.uniform(0, 0.5))
16    return all_products

When products comes back empty, you've reached the end.

The time.sleep() with random jitter keeps you under Shopify's informal rate limit (~2 requests/second).

Pro tip: If you fetched /meta.json first, you already know the total product count and can calculate exactly how many pages you need: pages = ceil(product_count / 250). This avoids the "one extra empty request at the end" pattern.

Step 4: Parse and Select the Fields You Need

Now that you have all products as a Python list of dictionaries, extract just the fields you care about. Here's an example that pulls the most common fields for price monitoring:

1def extract_product_data(products):
2    """Extract key fields from products, flattening variants."""
3    rows = []
4    for product in products:
5        for variant in product.get("variants", []):
6            rows.append({
7                "product_id": product["id"],
8                "title": product["title"],
9                "handle": product["handle"],
10                "vendor": product.get("vendor", ""),
11                "product_type": product.get("product_type", ""),
12                "tags": ", ".join(product.get("tags", [])),
13                "created_at": product.get("created_at", ""),
14                "variant_id": variant["id"],
15                "variant_title": variant.get("title", ""),
16                "sku": variant.get("sku", ""),
17                "price": variant.get("price", ""),
18                "compare_at_price": variant.get("compare_at_price", ""),
19                "available": variant.get("available", ""),
20                "image_url": product["images"][0]["src"] if product.get("images") else ""
21            })
22    return rows

This creates one row per variant—the most useful format for price comparison, since a single product like "Classic Blue Jeans" might have 12 variants (6 sizes × 2 colors), each with its own price and availability status.

Export Scraped Shopify Data to CSV and Excel with pandas

Every other Shopify scraping tutorial dumps raw JSON to a file and calls it done. Fine for developers. Useless for the ecommerce analyst who needs a spreadsheet by Friday.

The problem: Shopify's JSON is nested. One product can contain a dozen variants, each with its own price, SKU, and availability. Flattening that into rows and columns takes some pandas work.

Flatten Nested JSON into a Clean Table

There are two approaches, depending on your use case:

Option A: One row per variant (best for price monitoring and inventory tracking)

1# Using the extract_product_data function from Step 4
2products = scrape_all_products("https://allbirds.com")
3rows = extract_product_data(products)
4df = pd.DataFrame(rows)
5print(f"DataFrame shape: {df.shape}")
6print(df.head())

This gives you a flat table where each row is a unique product-variant combination. A store with 500 products and an average of 4 variants per product yields a ~2,000-row DataFrame.

Option B: One row per product summary (best for catalog overviews)

1def summarize_products(products):
2    """One row per product with min/max price across variants."""
3    rows = []
4    for product in products:
5        prices = [float(v["price"]) for v in product.get("variants", []) if v.get("price")]
6        rows.append({
7            "product_id": product["id"],
8            "title": product["title"],
9            "vendor": product.get("vendor", ""),
10            "product_type": product.get("product_type", ""),
11            "variant_count": len(product.get("variants", [])),
12            "min_price": min(prices) if prices else None,
13            "max_price": max(prices) if prices else None,
14            "any_available": any(v.get("available", False) for v in product.get("variants", [])),
15            "tags": ", ".join(product.get("tags", []))
16        })
17    return rows

Export to CSV, Excel, and Google Sheets

1# CSV export (use utf-8-sig so Excel handles special characters)
2df.to_csv("shopify_products.csv", index=False, encoding="utf-8-sig")
3# Excel export (requires openpyxl)
4df.to_excel("shopify_products.xlsx", index=False, engine="openpyxl")
5print("Exported to shopify_products.csv and shopify_products.xlsx")

For Google Sheets, you can use the gspread library with a service account, but honestly—for most use cases, exporting to CSV and uploading to Google Drive is faster and simpler.

Production-Ready Python Scraping: Rate Limits, Retries, and Anti-Blocking

The basic script handles small stores fine. Scraping a store with 5,000+ products, or hitting multiple stores in sequence? That's where things break.

Understanding Shopify's Rate Limits and Blocking Behavior

Shopify's public JSON endpoints don't have formally documented rate limits (unlike the Admin API's leaky bucket model), but empirical testing reveals:

  • Safe rate: ~2 requests per second per store
  • Soft ceiling: ~40 requests per minute before throttling kicks in
  • HTTP 429: "Too Many Requests"—the standard rate-limit response
  • HTTP 430: A Shopify-specific code indicating a security-level block (not just rate limiting)
  • HTTP 403 or CAPTCHA redirect: Some stores with additional Cloudflare protection

Requests from shared cloud infrastructure (AWS Lambda, Google Cloud Run) are particularly likely to trigger blocks because those IP ranges have high abuse rates.

Techniques to Scrape Shopify Reliably

Here's the progression from "works on my laptop" to "runs in production":

LevelTechniqueReliability
Basicrequests.get() + ?page=Breaks on large catalogs, may get blocked
Intermediaterequests.Session() + ?limit=250 + time.sleep(1) + retry on 429Works for most stores
AdvancedAsync httpx + rotating User-Agent + exponential backoffProduction-grade, scales to 10K+ products

Intermediate level (recommended for most users):

1import requests
2from requests.adapters import HTTPAdapter
3from urllib3.util.retry import Retry
4def create_session():
5    """Create a requests session with automatic retry logic."""
6    session = requests.Session()
7    retries = Retry(
8        total=5,
9        backoff_factor=1,  # sleep: 0.5s, 1s, 2s, 4s, 8s
10        status_forcelist=[429, 430, 500, 502, 503, 504],
11        respect_retry_after_header=True
12    )
13    session.mount("https://", HTTPAdapter(max_retries=retries))
14    session.headers.update({
15        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
16    })
17    return session

The Retry configuration handles 429 responses automatically with exponential backoff. The backoff_factor=1 means the sleep sequence is 0.5s → 1s → 2s → 4s → 8s between retries. Session reuse (requests.Session()) also gives you connection pooling, which reduces overhead when making multiple requests to the same domain.

User-Agent rotation: If you're scraping multiple stores, rotate between 3–5 realistic browser User-Agent strings. This isn't about deception—it's about not looking like a bot that sends identical headers on every request.

Full Working Python Script to Scrape Shopify with CSV Export

Here's the complete, copy-paste-ready script that combines everything above. It's about 75 lines of actual code (plus comments), and I've tested it against Allbirds (1,420 products), ColourPop (2,000+ products), and Zoologist Perfumes (small catalog).

1import requests
2import pandas as pd
3import time
4import random
5from requests.adapters import HTTPAdapter
6from urllib3.util.retry import Retry
7def create_session():
8    """Create a session with retry logic for rate limits."""
9    session = requests.Session()
10    retries = Retry(
11        total=5,
12        backoff_factor=1,
13        status_forcelist=[429, 430, 500, 502, 503, 504],
14        respect_retry_after_header=True
15    )
16    session.mount("https://", HTTPAdapter(max_retries=retries))
17    session.headers.update({
18        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
19                      "AppleWebKit/537.36 (KHTML, like Gecko) "
20                      "Chrome/125.0.0.0 Safari/537.36"
21    })
22    return session
23def scrape_shopify(store_url, delay=1.0):
24    """Scrape all products from a Shopify store via /products.json."""
25    session = create_session()
26    all_products = []
27    page = 1
28    base_url = f"{store_url.rstrip('/')}/products.json"
29    while True:
30        print(f"  Page {page}...", end=" ")
31        resp = session.get(base_url, params={"limit": 250, "page": page}, timeout=30)
32        resp.raise_for_status()
33        products = resp.json().get("products", [])
34        if not products:
35            break
36        all_products.extend(products)
37        print(f"{len(products)} products (total: {len(all_products)})")
38        page += 1
39        time.sleep(delay + random.uniform(0, 0.5))
40    return all_products
41def flatten_to_variants(products):
42    """Flatten nested product JSON into one row per variant."""
43    rows = []
44    for p in products:
45        base = {
46            "product_id": p["id"],
47            "title": p["title"],
48            "handle": p["handle"],
49            "vendor": p.get("vendor", ""),
50            "product_type": p.get("product_type", ""),
51            "tags": ", ".join(p.get("tags", [])),
52            "created_at": p.get("created_at", ""),
53            "updated_at": p.get("updated_at", ""),
54            "image_url": p["images"][0]["src"] if p.get("images") else "",
55        }
56        for v in p.get("variants", []):
57            row = {**base}
58            row["variant_id"] = v["id"]
59            row["variant_title"] = v.get("title", "")
60            row["sku"] = v.get("sku", "")
61            row["price"] = v.get("price", "")
62            row["compare_at_price"] = v.get("compare_at_price", "")
63            row["available"] = v.get("available", "")
64            rows.append(row)
65    return rows
66if __name__ == "__main__":
67    STORE_URL = "https://allbirds.com"  # Change this to your target store
68    OUTPUT_CSV = "shopify_products.csv"
69    OUTPUT_EXCEL = "shopify_products.xlsx"
70    print(f"Scraping {STORE_URL}...")
71    products = scrape_shopify(STORE_URL)
72    print(f"\nTotal products scraped: {len(products)}")
73    print("Flattening to variant-level rows...")
74    rows = flatten_to_variants(products)
75    df = pd.DataFrame(rows)
76    print(f"DataFrame: {df.shape[0]} rows x {df.shape[1]} columns")
77    df.to_csv(OUTPUT_CSV, index=False, encoding="utf-8-sig")
78    df.to_excel(OUTPUT_EXCEL, index=False, engine="openpyxl")
79    print(f"\nExported to {OUTPUT_CSV} and {OUTPUT_EXCEL}")

Run it with python scrape_shopify.py. For Allbirds, this takes about 45 seconds and produces a CSV with ~5,000+ rows (one per variant). The terminal output looks something like:

1Scraping https://allbirds.com...
2  Page 1... 250 products (total: 250)
3  Page 2... 250 products (total: 500)
4  ...
5  Page 6... 170 products (total: 1420)
6Total products scraped: 1420
7Flattening to variant-level rows...
8DataFrame: 5680 rows x 14 columns
9Exported to shopify_products.csv and shopify_products.xlsx

Skip the Python: Scrape Shopify in 2 Clicks with Thunderbit (No-Code Alternative)

Not everyone wants to install Python, debug import errors, or maintain a scraping script. For the sales rep who needs competitor pricing by tomorrow morning, Python is overkill.

That's why we built —an AI web scraper that runs as a Chrome extension. No code, no API keys, no environment setup.

How Thunderbit Scrapes Shopify Stores

Thunderbit has a dedicated Shopify Scraper template that's pre-configured for Shopify product pages. You install the , navigate to a Shopify store, and click "Scrape." The template automatically extracts product names, descriptions, prices, variant details, images, and vendor information.

For stores where the template doesn't perfectly match (custom themes, unusual layouts), Thunderbit's AI Suggest Fields feature reads the page and auto-generates column names. You can customize these—rename columns, add fields, write instructions like "only extract products with compare_at_price set."

A few features that map directly to what the Python script does:

  • Subpage scraping: Automatically visits each product detail page and enriches the table with full descriptions, reviews, or variant details—the same thing our Python script achieves by iterating through pages, but with zero code.
  • Automatic pagination: Handles click-through pagination and infinite scroll without configuration.
  • Scheduled scraping: Set up recurring jobs (e.g., "every Monday at 9am") for ongoing price monitoring—no cron job or server needed.
  • Free export to CSV, Excel, Google Sheets, Airtable, or Notion across all plans.

Python Script vs. Thunderbit: Honest Comparison

FactorPython ScriptThunderbit (No-Code)
Setup time15–60 min (environment + code)~2 min (install Chrome extension)
Coding requiredYes (Python)None
CustomizationUnlimitedAI-suggested fields + custom prompts
Pagination handlingMust code manuallyAutomatic
Export formatsCode it yourself (CSV/Excel)CSV, Excel, Google Sheets, Airtable, Notion (free)
Scheduled runsCron job + hostingBuilt-in scheduler
Rate-limit handlingMust code retries/backoffHandled automatically
Best forDevelopers, large-scale data pipelinesBusiness users, quick extractions, recurring monitoring

Use Python when you need full control or are integrating into a larger data pipeline. Use Thunderbit when you need data fast and don't want to maintain code. For a deeper look at , we've written a separate guide on that topic.

python-vs-thunderbit-comparison.webp

Tips and Best Practices for Scraping Shopify Stores

These hold regardless of your tool choice:

  • Always use ?limit=250 to minimize total requests. The default of 30 per page means 8x more requests for the same data.
  • Respect the store: Add 1–2 second delays between requests. Hammering a server with rapid-fire requests is bad practice and increases your chances of getting blocked.
  • Check robots.txt first: Shopify's default robots.txt does NOT block /products.json. But some stores add custom rules, so verify before scraping at scale.
  • Store raw JSON locally first, then process. If your parsing logic changes later, you won't need to re-scrape. A simple json.dump(all_products, open("raw_data.json", "w")) before flattening saves headaches.
  • Deduplicate by product.id: Pagination edge cases can sometimes return duplicate products at page boundaries. A quick df.drop_duplicates(subset=["product_id", "variant_id"]) cleans this up.
  • Cast price to float before doing math. Shopify returns prices as strings ("185.00"), not numbers.
  • Monitor for endpoint changes: While /products.json has been stable for years, Shopify could theoretically restrict it. If your scraper suddenly gets 404s, check the store manually first.

For more on building robust scrapers, check out our guide on .

Brief section, but it matters.

The /products.json endpoint serves publicly available product data—the same information any visitor sees when browsing the store. Shopify's Terms of Service include language about not using "automated means" to access "the Services," but this language refers to the platform itself (admin dashboard, checkout), not public storefront data. No Shopify-specific scraping lawsuits have been filed as of April 2026.

Key legal precedents support public data scraping: the hiQ v. LinkedIn case established that scraping publicly accessible data doesn't violate the CFAA, and Meta v. Bright Data (2024) ruled that TOS restrictions only apply when a user is logged in.

Best practices:

  • Only scrape publicly available product data
  • Don't scrape personal or customer data
  • Respect robots.txt and rate limits
  • Comply with GDPR/CCPA if handling any personal data (product catalog data is non-personal)
  • Identify yourself with a clear User-Agent string
  • Scraping your own Shopify store via the Admin API is always fine

For a deeper dive, see our post on .

Conclusion and Key Takeaways

Shopify's public /products.json endpoint makes ecommerce data extraction about as easy as it gets. The workflow is: append /products.json → fetch with Python → paginate with ?limit=250&page= → flatten with pandas → export to CSV or Excel.

What this guide covers that others don't:

  • Complete field reference: Know exactly what data is available (40+ fields across products, variants, and images) before you write a single line of code
  • Additional endpoints: /collections.json and /meta.json give you category-level intelligence and store metadata that no competing tutorial covers
  • Production-ready techniques: Session reuse, exponential backoff, User-Agent headers, and ?limit=250 to handle real-world rate limits
  • Proper CSV/Excel export: Flattened variant-level data using pandas, not just raw JSON dumps
  • No-code alternative: for users who prefer speed over code flexibility

For one-off or recurring Shopify data pulls without code, try the —the Shopify Scraper template handles everything from pagination to export. For custom data pipelines or scraping at scale across many stores, the Python script in this guide gives you full control.

Check out our for video walkthroughs, or explore our guides on and for related techniques.

Try Thunderbit for Shopify Scraping

FAQs

Can you scrape any Shopify store with products.json?

Most Shopify stores expose this endpoint by default—in testing, about 71% returned valid JSON. Some stores with custom configurations or additional security layers (Cloudflare, headless setups) may return 404 or block the request. The quick check: visit {store-url}/products.json in your browser. If you see JSON, you're good.

Public product data (prices, titles, images, descriptions) is generally accessible, and legal precedents like hiQ v. LinkedIn support scraping publicly available information. That said, always check the specific store's Terms of Service and your local laws. Don't scrape personal or customer data, and respect rate limits.

How many products can you scrape from a Shopify store?

There's no hard limit on the total number. Pagination with ?limit=250&page= lets you retrieve the entire catalog. For very large stores (25,000+ products), use session reuse and delays to avoid rate limits. The /meta.json endpoint can tell you the exact product count upfront so you know how many pages to expect.

What's the difference between products.json and the Shopify Admin API?

/products.json is a public endpoint—no authentication, read-only product data, accessible to anyone. The Admin API requires store owner access tokens and provides orders, inventory quantities, customer data, and write access. If you need sales data or actual inventory counts, you need Admin API access (which means you need to be the store owner or have their permission).

Can I scrape Shopify without Python?

Absolutely. Tools like let you scrape Shopify stores from a Chrome extension with no code. It handles pagination automatically and exports directly to CSV, Excel, Google Sheets, Airtable, or Notion. For developers who prefer other languages, the same /products.json endpoint works with JavaScript, Ruby, Go—any language that can make HTTP requests and parse JSON.

Learn More

Ke
Ke
CTO @ Thunderbit. Ke is the person everyone pings when data gets messy. He's spent his career turning tedious, repetitive work into quiet little automations that just run. If you've ever wished a spreadsheet could fill itself in, Ke has probably already built the thing that does it.
Table of Contents

Try Thunderbit

Scrape leads & other data in just 2-clicks. Powered by AI.

Get Thunderbit It's free
Extract Data using AI
Easily transfer data to Google Sheets, Airtable, or Notion
Chrome Store Rating
PRODUCT HUNT#1 Product of the Week