Shopify's /products.json endpoint is one of the worst-kept secrets in ecommerce data. Append it to any Shopify store URL and you get structured JSON back—no API keys, no authentication, no scraping through nested HTML.
I work on the team, so I spend a lot of time thinking about how people extract data from the web. And Shopify scraping comes up constantly—sales teams tracking competitor prices, ecommerce ops people benchmarking product catalogs, procurement folks sourcing new vendors. With on Shopify and the platform holding roughly , the volume of scrapable product data is enormous.
This guide covers the entire process: what the endpoint returns, how to paginate through thousands of products, how to handle rate limits without getting blocked, and how to flatten Shopify's nested JSON into a clean CSV or Excel file using pandas. I'll also cover the endpoints nobody else talks about (/collections.json, /meta.json) and show a no-code alternative for people who'd rather skip Python entirely.
What Is Shopify's /products.json Endpoint (and Why It Makes Scraping Easy)
Every Shopify store has a public endpoint at {store-url}/products.json that returns structured product data. No API keys. No OAuth. No authentication of any kind. You literally append /products.json to the store URL and get back a JSON array of every product in the catalog.
Try it yourself right now: open or in your browser. You'll see clean, structured JSON with product titles, prices, variants, images, tags—everything.
Compare that to the alternative: parsing Shopify's HTML themes, which are deeply nested, inconsistent across stores, and change whenever a merchant updates their theme. Here's what you'd be dealing with:
The HTML approach (painful):
1<div class="product-card__info">
2 <h3 class="product-card__title">
3 <a href="/products/classic-blue-jeans">Classic Blue Jeans</a>
4 </h3>
5 <span class="price price--on-sale" data-product-price>$149.00</span>
6</div>
The JSON approach (clean):
1{
2 "title": "Classic Blue Jeans",
3 "handle": "classic-blue-jeans",
4 "vendor": "Hiut Denim",
5 "variants": [{"price": "149.00", "sku": "HD-BLU-32", "available": true}]
6}
JSON wins on consistency, reliability, and ease of parsing. The endpoint also supports two key query parameters—?limit= (up to 250 products per page, default is 30) and ?page= for pagination—which we'll use extensively in the code below.
Important distinction: this is a public storefront endpoint, not the . The Admin API requires store owner access tokens and provides order data, inventory levels, and customer information. The public /products.json endpoint is read-only product data that anyone can access. I'll cover the distinction in detail later, because forum confusion about this is rampant.
A caveat: not every Shopify store exposes this endpoint. In my testing, about 71% of stores return valid JSON (allbirds.com, gymshark.com, colourpop.com, kyliecosmetics.com all work), while some custom configurations return 404 (hiutdenim.co.uk, bombas.com). The quick check is simple: visit {store-url}/products.json in your browser and see what you get.
Why Scrape Shopify with Python? Top Business Use Cases
Why bother? ROI. now use automated price scraping for competitive intelligence, up from just 34% in 2020. And research shows a . The data is worth real money.
Here are the most common use cases I see:
| Use Case | Who Benefits | What You Get |
|---|---|---|
| Competitor price monitoring | Ecommerce ops teams | Track pricing changes, discounts, and compare-at prices across competitor catalogs |
| Product research & sourcing | Procurement / merchandising | Compare product features, variants, materials, and availability |
| Lead generation | Sales teams | Extract vendor names, brand data, and contact info from store catalogs |
| Market & category analysis | Marketing teams | Understand product mix, tags, collection structure, and positioning |
| Inventory & availability tracking | Supply chain teams | Monitor variant-level stock status (available: true/false) over time |
| New product detection | Product teams | Track created_at timestamps to spot new launches from competitors |
Python is the natural fit for this work. use Python as their primary language, and the ecosystem—requests for HTTP, pandas for data manipulation, httpx for async—makes it straightforward to go from "I have a URL" to "I have a spreadsheet" in under 80 lines of code.
Complete products.json Field Reference: Every Field Explained
Every competing tutorial shows you title, id, and handle, then moves on. Shopify's JSON response contains over 40 fields across products, variants, images, and options. Knowing what's available before you write your scraping code saves you from re-scraping later.
I pulled this reference from live /products.json responses fetched on April 16, 2026. The structure is consistent across all stores that expose the endpoint.
Product-Level Fields
| Field | Data Type | Example Value | Business Use Case |
|---|---|---|---|
id | Integer | 123456789 | Unique product identifier for deduplication |
title | String | "Classic Blue Jeans" | Product name for catalogs and comparisons |
handle | String | "classic-blue-jeans" | URL slug—construct product page links as {store}/products/{handle} |
body_html | String (HTML) or null | Our best-selling... | Product description for content analysis and SEO research |
vendor | String | "Hiut Denim" | Brand/vendor name for lead gen or sourcing |
product_type | String | "Jeans" | Category classification for market analysis |
created_at | ISO DateTime | "2024-01-15T10:30:00-05:00" | Track when products were added (new launch detection) |
updated_at | ISO DateTime | "2025-03-01T08:00:00-05:00" | Detect recent catalog changes |
published_at | ISO DateTime | "2024-01-16T00:00:00-05:00" | Know when products went live on the storefront |
tags | Array of Strings | ["organic", "women", "straight-leg"] | Keyword/tag analysis for SEO, categorization, and trend spotting |
variants | Array of Objects | (see variant fields below) | Price, SKU, availability per variant |
images | Array of Objects | (see image fields below) | Product image URLs for catalogs and visual analysis |
options | Array of Objects | [{"name": "Size", "values": ["S","M","L"]}] | Understand product configuration (size, color, material) |
Variant-Level Fields (nested under each product)
| Field | Data Type | Example | Use Case |
|---|---|---|---|
id | Integer | 987654321 | Unique variant identifier |
title | String | "32 / Blue" | Variant display name |
sku | String | "HD-BLU-32" | SKU matching for inventory systems |
price | String | "185.00" | Price monitoring (note: it's a string, cast to float for math) |
compare_at_price | String or null | "200.00" | Original price—essential for discount tracking |
available | Boolean | true | Stock availability (the only public stock indicator) |
weight | Float | 1.2 | Shipping/logistics analysis |
option1, option2, option3 | String | "32", "Blue", null | Individual option values |
created_at, updated_at | ISO DateTime | — | Variant-level change tracking |
Image-Level Fields
| Field | Data Type | Example | Use Case |
|---|---|---|---|
id | Integer | 111222333 | Unique image identifier |
src | String (URL) | "https://cdn.shopify.com/..." | Direct image download link |
alt | String or null | "Front view of jeans" | Image alt text for accessibility analysis |
position | Integer | 1 | Image ordering |
width, height | Integer | 2048, 2048 | Image dimensions |
What's NOT in the Public Endpoint
A critical gotcha: inventory_quantity is NOT available on public /products.json responses. This field was removed from public-facing JSON endpoints in December 2017 for security reasons. The only stock indicator you get is the boolean available field on each variant (true or false). To access actual inventory counts, you need the authenticated Admin API with store owner credentials.
Before writing your scraping code, scan this table and decide which fields matter for your use case. If you're doing price monitoring, you need variants[].price, variants[].compare_at_price, and variants[].available. If you're doing lead gen, focus on vendor, product_type, and tags. Filter accordingly—your CSV will be much cleaner.
Beyond products.json: Collections, Meta, and Other Shopify Endpoints
No competing tutorial mentions these endpoints. They're essential for serious competitive intelligence work.
/collections.json — All Store Categories
Returns every collection (category) in the store with titles, handles, descriptions, and product counts. I verified this on zoologistperfumes.com, allbirds.com, and gymshark.com—all returned valid JSON.
1{
2 "collections": [
3 {
4 "id": 308387348539,
5 "title": "Attars",
6 "handle": "attars",
7 "published_at": "2026-03-29T12:20:32-04:00",
8 "products_count": 1,
9 "image": { "src": "https://cdn.shopify.com/..." }
10 }
11 ]
12}
Want to understand how a competitor organizes their catalog? This is the endpoint.
/collections/{handle}/products.json — Products by Category
Returns products filtered by a specific collection. Same JSON structure as /products.json, but scoped to one category. This is critical for category-level scraping—say you only want to monitor a competitor's "Sale" or "New Arrivals" collection.
/meta.json — Store-Level Metadata
Returns store name, description, currency, country, and—here's the good part—published_products_count. That count lets you pre-calculate exactly how many pagination pages you'll need: ceil(published_products_count / 250). No more blindly incrementing pages until you hit an empty response.
Which Endpoint Should You Use?
| What You Want | Endpoint | Auth Needed? |
|---|---|---|
| All products (public) | /products.json | No |
| Products in a specific category | /collections/{handle}/products.json | No |
| Store metadata + product count | /meta.json | No |
| All collections (categories) | /collections.json | No |
| Order/sales data (own store only) | Admin API /orders.json | Yes (API key) |
| Inventory quantities (own store only) | Admin API /inventory_levels.json | Yes |
The recurring forum question—"Can I scrape how many units a competitor sold?"—has a short answer: no. Not from public endpoints. Sales data and inventory quantities require the authenticated Admin API, which means you need store owner access. Public endpoints give you product catalog data only.

How to Scrape Shopify with Python: Step-by-Step Setup
- Difficulty: Beginner
- Time Required: ~15 minutes (setup + first scrape)
- What You'll Need: Python 3.11+,
pip, a terminal, and a Shopify store URL to scrape
Step 1: Install Python and Required Libraries
Make sure you have Python 3.11 or newer installed (pandas 3.0.x requires it). Then install the two libraries we need:
1pip install requests pandas
For Excel export, you'll also want:
1pip install openpyxl
At the top of your script, add these imports:
1import requests
2import pandas as pd
3import time
4import random
5import json
You should see no import errors when you run the script. If pandas throws a version error, upgrade Python to 3.12.
Step 2: Fetch Product Data from /products.json
Here's a basic function that takes a store URL, hits the endpoint, and returns parsed JSON:
1def fetch_products_page(store_url, page=1, limit=250):
2 """Fetch a single page of products from a Shopify store."""
3 url = f"{store_url.rstrip('/')}/products.json"
4 params = {"limit": limit, "page": page}
5 headers = {
6 "User-Agent": "Mozilla/5.0 (compatible; ProductResearch/1.0)"
7 }
8 response = requests.get(url, params=params, headers=headers, timeout=30)
9 response.raise_for_status()
10 return response.json().get("products", [])
Key details:
limit=250is the maximum Shopify allows per page. The default is 30, so setting this explicitly reduces your total requests by up to 8x.User-Agentheader: Always set a realistic one. Requests without a User-Agent are more likely to trigger Shopify's anti-bot systems.timeout=30: Don't let a single request hang forever.
Test it with a known store:
1products = fetch_products_page("https://allbirds.com")
2print(f"Fetched {len(products)} products")
3print(f"First product: {products[0]['title']}")
You should see something like: Fetched 250 products and the first product title.
Step 3: Handle Pagination to Scrape All Products
A single request returns at most 250 products. Most stores have more than that (Allbirds has 1,420+). You need to loop through pages until you get an empty response.
1def scrape_all_products(store_url, delay=1.0):
2 """Scrape all products from a Shopify store, handling pagination."""
3 all_products = []
4 page = 1
5 while True:
6 print(f"Fetching page {page}...")
7 products = fetch_products_page(store_url, page=page, limit=250)
8 if not products:
9 print(f"No more products. Total: {len(all_products)}")
10 break
11 all_products.extend(products)
12 print(f" Got {len(products)} products (total so far: {len(all_products)})")
13 page += 1
14 # Be polite: wait between requests
15 time.sleep(delay + random.uniform(0, 0.5))
16 return all_products
When products comes back empty, you've reached the end.
The time.sleep() with random jitter keeps you under Shopify's informal rate limit (~2 requests/second).
Pro tip: If you fetched /meta.json first, you already know the total product count and can calculate exactly how many pages you need: pages = ceil(product_count / 250). This avoids the "one extra empty request at the end" pattern.
Step 4: Parse and Select the Fields You Need
Now that you have all products as a Python list of dictionaries, extract just the fields you care about. Here's an example that pulls the most common fields for price monitoring:
1def extract_product_data(products):
2 """Extract key fields from products, flattening variants."""
3 rows = []
4 for product in products:
5 for variant in product.get("variants", []):
6 rows.append({
7 "product_id": product["id"],
8 "title": product["title"],
9 "handle": product["handle"],
10 "vendor": product.get("vendor", ""),
11 "product_type": product.get("product_type", ""),
12 "tags": ", ".join(product.get("tags", [])),
13 "created_at": product.get("created_at", ""),
14 "variant_id": variant["id"],
15 "variant_title": variant.get("title", ""),
16 "sku": variant.get("sku", ""),
17 "price": variant.get("price", ""),
18 "compare_at_price": variant.get("compare_at_price", ""),
19 "available": variant.get("available", ""),
20 "image_url": product["images"][0]["src"] if product.get("images") else ""
21 })
22 return rows
This creates one row per variant—the most useful format for price comparison, since a single product like "Classic Blue Jeans" might have 12 variants (6 sizes × 2 colors), each with its own price and availability status.
Export Scraped Shopify Data to CSV and Excel with pandas
Every other Shopify scraping tutorial dumps raw JSON to a file and calls it done. Fine for developers. Useless for the ecommerce analyst who needs a spreadsheet by Friday.
The problem: Shopify's JSON is nested. One product can contain a dozen variants, each with its own price, SKU, and availability. Flattening that into rows and columns takes some pandas work.
Flatten Nested JSON into a Clean Table
There are two approaches, depending on your use case:
Option A: One row per variant (best for price monitoring and inventory tracking)
1# Using the extract_product_data function from Step 4
2products = scrape_all_products("https://allbirds.com")
3rows = extract_product_data(products)
4df = pd.DataFrame(rows)
5print(f"DataFrame shape: {df.shape}")
6print(df.head())
This gives you a flat table where each row is a unique product-variant combination. A store with 500 products and an average of 4 variants per product yields a ~2,000-row DataFrame.
Option B: One row per product summary (best for catalog overviews)
1def summarize_products(products):
2 """One row per product with min/max price across variants."""
3 rows = []
4 for product in products:
5 prices = [float(v["price"]) for v in product.get("variants", []) if v.get("price")]
6 rows.append({
7 "product_id": product["id"],
8 "title": product["title"],
9 "vendor": product.get("vendor", ""),
10 "product_type": product.get("product_type", ""),
11 "variant_count": len(product.get("variants", [])),
12 "min_price": min(prices) if prices else None,
13 "max_price": max(prices) if prices else None,
14 "any_available": any(v.get("available", False) for v in product.get("variants", [])),
15 "tags": ", ".join(product.get("tags", []))
16 })
17 return rows
Export to CSV, Excel, and Google Sheets
1# CSV export (use utf-8-sig so Excel handles special characters)
2df.to_csv("shopify_products.csv", index=False, encoding="utf-8-sig")
3# Excel export (requires openpyxl)
4df.to_excel("shopify_products.xlsx", index=False, engine="openpyxl")
5print("Exported to shopify_products.csv and shopify_products.xlsx")
For Google Sheets, you can use the gspread library with a service account, but honestly—for most use cases, exporting to CSV and uploading to Google Drive is faster and simpler.
Production-Ready Python Scraping: Rate Limits, Retries, and Anti-Blocking
The basic script handles small stores fine. Scraping a store with 5,000+ products, or hitting multiple stores in sequence? That's where things break.
Understanding Shopify's Rate Limits and Blocking Behavior
Shopify's public JSON endpoints don't have formally documented rate limits (unlike the Admin API's leaky bucket model), but empirical testing reveals:
- Safe rate: ~2 requests per second per store
- Soft ceiling: ~40 requests per minute before throttling kicks in
- HTTP 429: "Too Many Requests"—the standard rate-limit response
- HTTP 430: A Shopify-specific code indicating a security-level block (not just rate limiting)
- HTTP 403 or CAPTCHA redirect: Some stores with additional Cloudflare protection
Requests from shared cloud infrastructure (AWS Lambda, Google Cloud Run) are particularly likely to trigger blocks because those IP ranges have high abuse rates.
Techniques to Scrape Shopify Reliably
Here's the progression from "works on my laptop" to "runs in production":
| Level | Technique | Reliability |
|---|---|---|
| Basic | requests.get() + ?page= | Breaks on large catalogs, may get blocked |
| Intermediate | requests.Session() + ?limit=250 + time.sleep(1) + retry on 429 | Works for most stores |
| Advanced | Async httpx + rotating User-Agent + exponential backoff | Production-grade, scales to 10K+ products |
Intermediate level (recommended for most users):
1import requests
2from requests.adapters import HTTPAdapter
3from urllib3.util.retry import Retry
4def create_session():
5 """Create a requests session with automatic retry logic."""
6 session = requests.Session()
7 retries = Retry(
8 total=5,
9 backoff_factor=1, # sleep: 0.5s, 1s, 2s, 4s, 8s
10 status_forcelist=[429, 430, 500, 502, 503, 504],
11 respect_retry_after_header=True
12 )
13 session.mount("https://", HTTPAdapter(max_retries=retries))
14 session.headers.update({
15 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
16 })
17 return session
The Retry configuration handles 429 responses automatically with exponential backoff. The backoff_factor=1 means the sleep sequence is 0.5s → 1s → 2s → 4s → 8s between retries. Session reuse (requests.Session()) also gives you connection pooling, which reduces overhead when making multiple requests to the same domain.
User-Agent rotation: If you're scraping multiple stores, rotate between 3–5 realistic browser User-Agent strings. This isn't about deception—it's about not looking like a bot that sends identical headers on every request.
Full Working Python Script to Scrape Shopify with CSV Export
Here's the complete, copy-paste-ready script that combines everything above. It's about 75 lines of actual code (plus comments), and I've tested it against Allbirds (1,420 products), ColourPop (2,000+ products), and Zoologist Perfumes (small catalog).
1import requests
2import pandas as pd
3import time
4import random
5from requests.adapters import HTTPAdapter
6from urllib3.util.retry import Retry
7def create_session():
8 """Create a session with retry logic for rate limits."""
9 session = requests.Session()
10 retries = Retry(
11 total=5,
12 backoff_factor=1,
13 status_forcelist=[429, 430, 500, 502, 503, 504],
14 respect_retry_after_header=True
15 )
16 session.mount("https://", HTTPAdapter(max_retries=retries))
17 session.headers.update({
18 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
19 "AppleWebKit/537.36 (KHTML, like Gecko) "
20 "Chrome/125.0.0.0 Safari/537.36"
21 })
22 return session
23def scrape_shopify(store_url, delay=1.0):
24 """Scrape all products from a Shopify store via /products.json."""
25 session = create_session()
26 all_products = []
27 page = 1
28 base_url = f"{store_url.rstrip('/')}/products.json"
29 while True:
30 print(f" Page {page}...", end=" ")
31 resp = session.get(base_url, params={"limit": 250, "page": page}, timeout=30)
32 resp.raise_for_status()
33 products = resp.json().get("products", [])
34 if not products:
35 break
36 all_products.extend(products)
37 print(f"{len(products)} products (total: {len(all_products)})")
38 page += 1
39 time.sleep(delay + random.uniform(0, 0.5))
40 return all_products
41def flatten_to_variants(products):
42 """Flatten nested product JSON into one row per variant."""
43 rows = []
44 for p in products:
45 base = {
46 "product_id": p["id"],
47 "title": p["title"],
48 "handle": p["handle"],
49 "vendor": p.get("vendor", ""),
50 "product_type": p.get("product_type", ""),
51 "tags": ", ".join(p.get("tags", [])),
52 "created_at": p.get("created_at", ""),
53 "updated_at": p.get("updated_at", ""),
54 "image_url": p["images"][0]["src"] if p.get("images") else "",
55 }
56 for v in p.get("variants", []):
57 row = {**base}
58 row["variant_id"] = v["id"]
59 row["variant_title"] = v.get("title", "")
60 row["sku"] = v.get("sku", "")
61 row["price"] = v.get("price", "")
62 row["compare_at_price"] = v.get("compare_at_price", "")
63 row["available"] = v.get("available", "")
64 rows.append(row)
65 return rows
66if __name__ == "__main__":
67 STORE_URL = "https://allbirds.com" # Change this to your target store
68 OUTPUT_CSV = "shopify_products.csv"
69 OUTPUT_EXCEL = "shopify_products.xlsx"
70 print(f"Scraping {STORE_URL}...")
71 products = scrape_shopify(STORE_URL)
72 print(f"\nTotal products scraped: {len(products)}")
73 print("Flattening to variant-level rows...")
74 rows = flatten_to_variants(products)
75 df = pd.DataFrame(rows)
76 print(f"DataFrame: {df.shape[0]} rows x {df.shape[1]} columns")
77 df.to_csv(OUTPUT_CSV, index=False, encoding="utf-8-sig")
78 df.to_excel(OUTPUT_EXCEL, index=False, engine="openpyxl")
79 print(f"\nExported to {OUTPUT_CSV} and {OUTPUT_EXCEL}")
Run it with python scrape_shopify.py. For Allbirds, this takes about 45 seconds and produces a CSV with ~5,000+ rows (one per variant). The terminal output looks something like:
1Scraping https://allbirds.com...
2 Page 1... 250 products (total: 250)
3 Page 2... 250 products (total: 500)
4 ...
5 Page 6... 170 products (total: 1420)
6Total products scraped: 1420
7Flattening to variant-level rows...
8DataFrame: 5680 rows x 14 columns
9Exported to shopify_products.csv and shopify_products.xlsx
Skip the Python: Scrape Shopify in 2 Clicks with Thunderbit (No-Code Alternative)
Not everyone wants to install Python, debug import errors, or maintain a scraping script. For the sales rep who needs competitor pricing by tomorrow morning, Python is overkill.
That's why we built —an AI web scraper that runs as a Chrome extension. No code, no API keys, no environment setup.
How Thunderbit Scrapes Shopify Stores
Thunderbit has a dedicated Shopify Scraper template that's pre-configured for Shopify product pages. You install the , navigate to a Shopify store, and click "Scrape." The template automatically extracts product names, descriptions, prices, variant details, images, and vendor information.
For stores where the template doesn't perfectly match (custom themes, unusual layouts), Thunderbit's AI Suggest Fields feature reads the page and auto-generates column names. You can customize these—rename columns, add fields, write instructions like "only extract products with compare_at_price set."
A few features that map directly to what the Python script does:
- Subpage scraping: Automatically visits each product detail page and enriches the table with full descriptions, reviews, or variant details—the same thing our Python script achieves by iterating through pages, but with zero code.
- Automatic pagination: Handles click-through pagination and infinite scroll without configuration.
- Scheduled scraping: Set up recurring jobs (e.g., "every Monday at 9am") for ongoing price monitoring—no cron job or server needed.
- Free export to CSV, Excel, Google Sheets, Airtable, or Notion across all plans.
Python Script vs. Thunderbit: Honest Comparison
| Factor | Python Script | Thunderbit (No-Code) |
|---|---|---|
| Setup time | 15–60 min (environment + code) | ~2 min (install Chrome extension) |
| Coding required | Yes (Python) | None |
| Customization | Unlimited | AI-suggested fields + custom prompts |
| Pagination handling | Must code manually | Automatic |
| Export formats | Code it yourself (CSV/Excel) | CSV, Excel, Google Sheets, Airtable, Notion (free) |
| Scheduled runs | Cron job + hosting | Built-in scheduler |
| Rate-limit handling | Must code retries/backoff | Handled automatically |
| Best for | Developers, large-scale data pipelines | Business users, quick extractions, recurring monitoring |
Use Python when you need full control or are integrating into a larger data pipeline. Use Thunderbit when you need data fast and don't want to maintain code. For a deeper look at , we've written a separate guide on that topic.

Tips and Best Practices for Scraping Shopify Stores
These hold regardless of your tool choice:
- Always use
?limit=250to minimize total requests. The default of 30 per page means 8x more requests for the same data. - Respect the store: Add 1–2 second delays between requests. Hammering a server with rapid-fire requests is bad practice and increases your chances of getting blocked.
- Check
robots.txtfirst: Shopify's defaultrobots.txtdoes NOT block/products.json. But some stores add custom rules, so verify before scraping at scale. - Store raw JSON locally first, then process. If your parsing logic changes later, you won't need to re-scrape. A simple
json.dump(all_products, open("raw_data.json", "w"))before flattening saves headaches. - Deduplicate by
product.id: Pagination edge cases can sometimes return duplicate products at page boundaries. A quickdf.drop_duplicates(subset=["product_id", "variant_id"])cleans this up. - Cast
priceto float before doing math. Shopify returns prices as strings ("185.00"), not numbers. - Monitor for endpoint changes: While
/products.jsonhas been stable for years, Shopify could theoretically restrict it. If your scraper suddenly gets 404s, check the store manually first.
For more on building robust scrapers, check out our guide on .
Legal and Ethical Considerations When Scraping Shopify
Brief section, but it matters.
The /products.json endpoint serves publicly available product data—the same information any visitor sees when browsing the store. Shopify's Terms of Service include language about not using "automated means" to access "the Services," but this language refers to the platform itself (admin dashboard, checkout), not public storefront data. No Shopify-specific scraping lawsuits have been filed as of April 2026.
Key legal precedents support public data scraping: the hiQ v. LinkedIn case established that scraping publicly accessible data doesn't violate the CFAA, and Meta v. Bright Data (2024) ruled that TOS restrictions only apply when a user is logged in.
Best practices:
- Only scrape publicly available product data
- Don't scrape personal or customer data
- Respect
robots.txtand rate limits - Comply with GDPR/CCPA if handling any personal data (product catalog data is non-personal)
- Identify yourself with a clear User-Agent string
- Scraping your own Shopify store via the Admin API is always fine
For a deeper dive, see our post on .
Conclusion and Key Takeaways
Shopify's public /products.json endpoint makes ecommerce data extraction about as easy as it gets. The workflow is: append /products.json → fetch with Python → paginate with ?limit=250&page= → flatten with pandas → export to CSV or Excel.
What this guide covers that others don't:
- Complete field reference: Know exactly what data is available (40+ fields across products, variants, and images) before you write a single line of code
- Additional endpoints:
/collections.jsonand/meta.jsongive you category-level intelligence and store metadata that no competing tutorial covers - Production-ready techniques: Session reuse, exponential backoff, User-Agent headers, and
?limit=250to handle real-world rate limits - Proper CSV/Excel export: Flattened variant-level data using pandas, not just raw JSON dumps
- No-code alternative: for users who prefer speed over code flexibility
For one-off or recurring Shopify data pulls without code, try the —the Shopify Scraper template handles everything from pagination to export. For custom data pipelines or scraping at scale across many stores, the Python script in this guide gives you full control.
Check out our for video walkthroughs, or explore our guides on and for related techniques.
FAQs
Can you scrape any Shopify store with products.json?
Most Shopify stores expose this endpoint by default—in testing, about 71% returned valid JSON. Some stores with custom configurations or additional security layers (Cloudflare, headless setups) may return 404 or block the request. The quick check: visit {store-url}/products.json in your browser. If you see JSON, you're good.
Is it legal to scrape Shopify stores?
Public product data (prices, titles, images, descriptions) is generally accessible, and legal precedents like hiQ v. LinkedIn support scraping publicly available information. That said, always check the specific store's Terms of Service and your local laws. Don't scrape personal or customer data, and respect rate limits.
How many products can you scrape from a Shopify store?
There's no hard limit on the total number. Pagination with ?limit=250&page= lets you retrieve the entire catalog. For very large stores (25,000+ products), use session reuse and delays to avoid rate limits. The /meta.json endpoint can tell you the exact product count upfront so you know how many pages to expect.
What's the difference between products.json and the Shopify Admin API?
/products.json is a public endpoint—no authentication, read-only product data, accessible to anyone. The Admin API requires store owner access tokens and provides orders, inventory quantities, customer data, and write access. If you need sales data or actual inventory counts, you need Admin API access (which means you need to be the store owner or have their permission).
Can I scrape Shopify without Python?
Absolutely. Tools like let you scrape Shopify stores from a Chrome extension with no code. It handles pagination automatically and exports directly to CSV, Excel, Google Sheets, Airtable, or Notion. For developers who prefer other languages, the same /products.json endpoint works with JavaScript, Ruby, Go—any language that can make HTTP requests and parse JSON.
Learn More
