How to Scrape Redfin with Python (Without Getting Blocked)

Last Updated on April 15, 2026

Redfin updates of them hitting the wire. That kind of freshness is catnip for anyone building a real estate data pipeline — and it's exactly why so many scrapers target Redfin and get blocked within minutes.

I've spent years working on data extraction tools at , and I can tell you: the gap between "scrape Redfin" and "scrape Redfin without getting blocked" is where most tutorials fall apart. They show you the BeautifulSoup code, skip the part where Cloudflare eats your requests alive, and leave you staring at a 403 page wondering what went wrong. This guide is different. I'll walk you through three real approaches — HTML parsing, Redfin's hidden API, and a no-code path with Thunderbit — and spend serious time on the anti-bot defenses that actually matter. By the end, you'll know exactly which method fits your skill level, your scale, and your tolerance for maintenance headaches.

What Is Redfin, and Why Does Its Data Matter?

Redfin is a technology-powered real estate brokerage with salaried agents who pull listings directly from MLS feeds. It covers and serves nearly 50 million monthly visitors. Unlike aggregator-only portals, Redfin's data is agent-verified, and its proprietary Redfin Estimate AVM covers with a median error of just 1.96% for on-market homes.

redfin_stats_f3c7fb5cbd.png

That combination — MLS-speed freshness, broker-verified quality, and a tight AVM — is why real estate investors, agents, proptech startups, and data analysts all want programmatic access to Redfin data. Python is the natural choice for the job: its scraping ecosystem (requests, BeautifulSoup, Selenium, Playwright) is mature, the community support is enormous, and it plugs directly into pandas and Jupyter for analysis.

Why Scrape Redfin with Python?

The use cases are as varied as the people who need the data. Here's how different audiences typically use scraped Redfin data:

AudiencePrimary Scraping GoalExample Use Case
Real estate agentsLead generation, market intelNew listings and expired listings in service area; agent directory for competitive benchmarking
Real estate investorsDeal flow, cap-rate analysisRental yield screens, undervalued property detection, daily new-listing alerts
Proptech startupsProduct data pipelinesAVM training data, market dashboards, iBuyer acquisition engines
Data analystsMarket research, BIZIP-level median price trends, days-on-market time series, sale-to-list ratios
Wholesalers / flippersDistressed property trackingPrice-cut detection, foreclosures, off-market comps

The broader trend backs this up: now use predictive analytics to identify opportunities and manage risk. The PropTech market is projected to hit at a 16.4% CAGR. Structured real estate data isn't a nice-to-have anymore — it's table stakes.

Every Redfin Data Field You Can Scrape (Complete Reference)

Before writing a single line of code, you need to know what's actually available. I've audited Redfin's search results pages, property detail pages, and agent profiles — and cross-referenced with open-source Stingray API wrappers like the and projects. The total count comes to 117 distinct fields across page types.

This table is designed to be bookmarked. Knowing your data schema before you code saves hours of trial-and-error selector hunting.

Search Results Page Fields

These are the lightweight fields available on listing cards — often extractable without full JS rendering:

FieldData TypeNotes
Property IDNumberRedfin internal int, parsed from /home/{id} in href
List priceNumber
Full addressText
Beds / Baths / SqFtNumberThree values in sequence
Property typeSingle SelectSFH, Condo, Townhouse, Multi
StatusTextActive, Pending, Contingent
Days on marketNumber
Price cut indicatorNumberDelta from original list
Primary photoImage URLOne photo per card
Hot Home badgeBoolean
Open house date/timeText
Brokerage attributionText

Property Detail Page Fields

The detail page is where the real depth lives. Many of these fields require JavaScript rendering or the Stingray API:

FieldData TypeNotes
Redfin Estimate (on-market)NumberVia /stingray/api/home/details/avm
Redfin Estimate (off-market)NumberVia /stingray/api/home/details/owner-estimate; 7.52% median error
Year built / renovatedNumber
Lot sizeNumber
HOA duesNumberMonthly, if applicable
Property tax (annual)Number
Tax assessed valueNumber
Sale history tableTablePrice, date, event type
Property descriptionTextMarketing paragraph
Photo URLs (carousel)Image URLs20+ per listing
Listing agent name, phone, emailText / Phone / EmailPhone often masked
School ratings (elementary/middle/high)NumberPlus district name
Walk / Transit / Bike ScoreNumber
Climate risk scoresNumberFlood, fire, heat, wind
Similar active / sold / nearby homesURLsCarousel data
Parking, garage, heating, coolingTextAmenity groups

Agent Profile Fields

FieldData TypeNotes
Agent name, photo, brokerage, bioText / Image
Phone, contact formPhone / TextClick-to-reveal
Active listings countNumber
Sales last 12 months / total volumeNumber
Avg list-to-sale ratioNumber
Star rating / review countNumber
Years of experience / license #Text / Number

When you use Thunderbit's AI Suggest Fields feature on a Redfin page, it auto-detects most of these columns and assigns the correct data types — no manual CSS selector mapping required. More on that later.

Redfin's Anti-Bot Defenses Decoded (Not Just "Use a Proxy")

This is where I want to plant a flag, because most tutorials hand-wave past the blocking problem and jump to "buy proxies from our sponsor." That's not helpful. If you don't understand what Redfin does to detect scrapers, you'll burn through proxy credits and still get blocked. , and — "less aggressive than Zillow's enterprise WAF, relying on custom rate limiting and JavaScript challenges."

Redfin runs a layered stack: Cloudflare at the edge (JS challenge, Turnstile, TLS/JA3 fingerprinting) plus a Redfin-specific application-layer rate limiter. There's no Crawl-delay directive in their robots.txt because enforcement happens at the WAF level.

Why Simple requests + BeautifulSoup Fails on Redfin

If you fire off a basic requests.get() to a Redfin property page with default headers, here's what typically happens:

  • HTTP 403 — Cloudflare's JS challenge wasn't solved, so you get the challenge page instead of the listing.
  • An interstitial challenge page — HTML body contains Cloudflare's Turnstile widget, not property data.
  • HTTP 200 with partial HTML — You get a shell with a large embedded JSON blob under root.__reactServerState.InitialContext, but no pre-rendered search cards, no price history, no school ratings.

Redfin uses its own (not Next.js), and the hydration key is Redfin-specific — root.__reactServerState.InitialContext with listing data nested under ReactServerAgent.cache.dataCache. This is not __NEXT_DATA__ or window.__INITIAL_STATE__.

The single most common cause of silent 403s? Missing Sec-Fetch-* headers. Redfin/Cloudflare explicitly validates Sec-Fetch-Site, Sec-Fetch-Mode, Sec-Fetch-Dest, and Sec-Fetch-User. If they're absent, you're flagged immediately.

The Mitigation Playbook: Delays, Headers, Proxies, and Sessions

Here's the full defense-by-defense breakdown, with specific mitigations for each:

Redfin DefenseWhat It DoesDetection SignalMitigation Strategy
Cloudflare JS challengeInterstitial that issues cf_clearance cookie403 + Cloudflare HTML bodycurl_cffi with impersonate="chrome120"; warm session via homepage; US residential proxy
Cloudflare TurnstileInteractive CAPTCHA on high-risk sessions403 + Turnstile widgetHeadless browser with stealth + residential proxy
Cloudflare Error 1020 (ASN ban)Blocks flagged IPs/ASNs at WAF403 body "Error 1020 Access Denied"Rotate to residential/mobile proxy; never use datacenter ASNs
TLS/JA3 fingerprintingDetects non-browser TLS stacksSilent 403 even with perfect headerscurl_cffi impersonation or real browser
HTTP/2 fingerprintingChecks HTTP/2 SETTINGS, HPACK orderSilent blockcurl_cffi speaks HTTP/2 like Chrome
Header validation (UA, Sec-Fetch-*)Browser-consistent header set403 on first requestFull Chrome header set including Sec-Fetch-Site/Mode/Dest/User, realistic Referer
Cookie/session continuityTracks cf_clearance, RF_BROWSER_IDChallenges on cold deep-URL hitsPersistent Session; warm on homepage first
App-layer rate limitPer-IP request limiter4292–5s delay with jitter; exponential backoff
Datacenter IP reputationBlocks known DC ASNsImmediate 1020/403US residential or mobile proxies only
Concurrency detectionMultiple parallel requests from one IPSudden Turnstile escalation≤2 concurrent per IP

Practical thresholds from community testing:

  • Safe cadence: 1 request per 2–3 seconds per IP
  • Sustained >20–30 req/min from a single datacenter IP triggers a challenge within minutes
  • Soft rate-limits lift in 5–15 minutes if traffic stops
  • Datacenter IP bans (AWS, GCP, Azure, OVH) can persist hours to days

Stock Python requests (urllib3 + OpenSSL) produces a — and gets blocked silently even with perfect headers. The industry fix is curl_cffi with impersonate="chrome120", which speaks Chrome-accurate TLS + HTTP/2.

Three Ways to Scrape Redfin with Python (and Which to Pick)

redfin_methods_dc0828acd4.png

I haven't found a single competing tutorial that compares all three approaches side by side. Here's the decision matrix:

CriteriaHTML Parsing (BS4 + Selenium)Stingray Hidden APIThunderbit (No-Code)
Setup difficultyMedium (Python env + browser driver)High (reverse-engineering endpoints)Low (Chrome extension install)
Anti-bot riskHigh (DOM requests are most visible)Medium (API-like requests look cleaner)Lowest (uses your real browser session)
Data structure qualityMedium (unstructured HTML → manual parsing)Excellent (pre-structured JSON)High (AI auto-detects fields + types)
Maintenance burdenHigh — one layout change breaks selectorsMedium — endpoints can change without noticeLowest — AI adapts to layout changes
ScaleLow–medium (hundreds with proxies)Medium–high (thousands, cleaner requests)Medium (50 pages/batch via cloud scraping)
Best forDevelopers wanting full controlDevelopers needing clean JSONNon-devs, quick projects, ongoing data without dev resources

The maintenance angle is worth emphasizing. Redfin has shipped two card DOM generations — legacy (homecardV2Price) and current (span.bp-Homecard__Price--value). Community GitHub issue history shows CSS-selector breakage roughly every 6–12 months. When that happens, a BeautifulSoup scraper breaks overnight. An AI-based field detector adapts.

Before You Start

  • Difficulty: Intermediate (Approaches 1 & 2), Beginner (Approach 3)
  • Time Required: ~30 minutes for Approach 1 or 2; ~5 minutes for Approach 3
  • What You'll Need:
    • Python 3.8+ with pip (Approaches 1 & 2)
    • Chrome browser (all approaches)
    • (Approach 3)
    • US residential proxies for large-scale scraping (Approaches 1 & 2)

Approach 1: Scrape Redfin with Python Using HTML Parsing (BeautifulSoup + Selenium)

This is the "full control" path. You write the selectors, you manage the browser, you handle the errors.

It's the most educational approach. It's also the most fragile.

Step 1: Set Up Your Python Environment

Create a virtual environment and install the required libraries:

1python -m venv redfin-scraper
2source redfin-scraper/bin/activate  # On Windows: redfin-scraper\Scripts\activate
3pip install requests beautifulsoup4 selenium webdriver-manager pandas curl_cffi

curl_cffi is essential here — it's what lets your HTTP requests impersonate a real Chrome TLS fingerprint instead of the stock Python requests fingerprint that Cloudflare blocks on sight.

Step 2: Configure Browser Headers and Session

This is where most beginners fail. You need the full Chrome header set, including the Sec-Fetch-* headers that Redfin/Cloudflare explicitly validates:

1from curl_cffi import requests as curl_requests
2HEADERS = {
3    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
4                  "AppleWebKit/537.36 (KHTML, like Gecko) "
5                  "Chrome/120.0.0.0 Safari/537.36",
6    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
7    "Accept-Language": "en-US,en;q=0.9",
8    "Accept-Encoding": "gzip, deflate, br",
9    "Sec-Fetch-Site": "none",
10    "Sec-Fetch-Mode": "navigate",
11    "Sec-Fetch-Dest": "document",
12    "Sec-Fetch-User": "?1",
13}
14session = curl_requests.Session(impersonate="chrome120")
15session.headers.update(HEADERS)
16# Warm the session — collect cf_clearance and RF_BROWSER_ID cookies
17session.get("https://www.redfin.com/")

The session warming step is critical — hitting a deep property URL cold (no prior cookies, no Referer) gets scored down by Cloudflare.

Always start with the homepage.

Step 3: Scrape Redfin Search Results

With your session warmed, you can fetch a city search page and parse the listing cards. Current-generation selectors (2024–2026):

1import time
2import random
3from bs4 import BeautifulSoup
4base_url = "https://www.redfin.com/city/17151/CA/San-Francisco"
5listings = []
6for page_num in range(1, 6):  # Pages 1-5
7    url = f"{base_url}/page-{page_num}" if page_num > 1 else base_url
8    resp = session.get(url)
9    if resp.status_code != 200:
10        print(f"Blocked on page {page_num}: HTTP {resp.status_code}")
11        break
12    soup = BeautifulSoup(resp.text, "html.parser")
13    cards = soup.select("[data-rf-test-id='property-card'], a.bp-Homecard")
14    for card in cards:
15        price_el = card.select_one("span.bp-Homecard__Price--value")
16        addr_el = card.select_one("a.bp-Homecard__Address")
17        stats = card.select("span.bp-Homecard__LockedStat--value")
18        listing = {
19            "price": price_el.text.strip() if price_el else None,
20            "address": addr_el.text.strip() if addr_el else None,
21            "beds": stats[0].text.strip() if len(stats) > 0 else None,
22            "baths": stats[1].text.strip() if len(stats) > 1 else None,
23            "sqft": stats[2].text.strip() if len(stats) > 2 else None,
24            "url": "https://www.redfin.com" + addr_el["href"] if addr_el else None,
25        }
26        listings.append(listing)
27    # Random delay between 2-5 seconds
28    time.sleep(random.uniform(2, 5))
29print(f"Scraped {len(listings)} listings")

You should see a growing list of dictionaries, each containing a San Francisco listing's price, address, beds/baths/sqft, and detail URL. If you get 0 cards, check the HTTP status code — a 403 means Cloudflare caught you, and you likely need residential proxies.

Step 4: Scrape Individual Property Detail Pages

Search results give you the basics. Detail pages give you the Redfin Estimate, year built, HOA, sale history, agent info, and photos. These pages require JavaScript rendering, so switch to Selenium:

1from selenium import webdriver
2from selenium.webdriver.chrome.service import Service
3from webdriver_manager.chrome import ChromeDriverManager
4from selenium.webdriver.common.by import By
5import time
6options = webdriver.ChromeOptions()
7options.add_argument("--headless=new")
8options.add_argument("--disable-blink-features=AutomationControlled")
9options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
10                     "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
11driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
12for listing in listings[:10]:  # Enrich first 10
13    driver.get(listing["url"])
14    time.sleep(random.uniform(3, 6))  # Wait for JS rendering
15    try:
16        estimate_el = driver.find_element(By.CSS_SELECTOR, "[data-rf-test-name='avmLdpPrice']")
17        listing["redfin_estimate"] = estimate_el.text.strip()
18    except:
19        listing["redfin_estimate"] = None
20    try:
21        year_built = driver.find_element(By.XPATH, "//span[contains(text(),'Year Built')]/following-sibling::span")
22        listing["year_built"] = year_built.text.strip()
23    except:
24        listing["year_built"] = None
25driver.quit()

After this step, your first 10 listings should be enriched with Redfin Estimate values and year-built data. The XPath selectors are more resilient than CSS for these nested amenity fields, but they're still fragile — any DOM restructuring will break them.

Step 5: Handle Blocks and Errors

Implement retry logic with exponential backoff:

1import time
2def fetch_with_retry(session, url, max_retries=3):
3    for attempt in range(max_retries):
4        resp = session.get(url)
5        if resp.status_code == 200:
6            return resp
7        elif resp.status_code in (403, 429, 503):
8            wait = (2 ** attempt) + random.uniform(1, 3)
9            print(f"Blocked ({resp.status_code}). Retrying in {wait:.1f}s...")
10            time.sleep(wait)
11        else:
12            print(f"Unexpected status: {resp.status_code}")
13            break
14    return None

Signs you've been blocked: HTTP 403 with Cloudflare HTML in the body, HTTP 429 (explicit rate limit), empty response body, or "Error 1020 Access Denied" in the page content. If you're hitting these consistently, it's time to add residential proxies or switch to the API approach.

Approach 2: Scrape Redfin with Python Using the Hidden Stingray API

This is my favorite approach. Redfin's frontend talks to an internal JSON API at /stingray/api/home/details/*, and the responses come back as clean, typed JSON — no HTML parsing required.

How to Discover Redfin's Hidden API Endpoints

Open Chrome DevTools → Network tab → filter by Fetch/XHR → navigate to any Redfin property page. You'll see requests to endpoints like:

  • api/home/details/initialInfo — resolves URL → propertyId, listingId
  • api/home/details/aboveTheFold — price, beds, baths, sqft, photos, status, agent, MLS#
  • api/home/details/belowTheFold — amenities, HOA, taxes, parking, year built, lot, history
  • api/home/details/avm — on-market Redfin Estimate
  • api/home/details/owner-estimate — off-market Redfin Estimate
  • api/home/details/descriptiveParagraph — marketing description

For rental pages, the rentalId (a 36-character UUID) is extracted from the <meta property="og:image"> tag URL.

Scraping Property Data via the Stingray API

There's a critical quirk: Stingray JSON responses are prefixed with the literal string {}&& as an anti-CSRF measure. You must strip this before parsing:

1import json
2from curl_cffi import requests as curl_requests
3session = curl_requests.Session(impersonate="chrome120")
4session.headers.update(HEADERS)
5# Warm session
6session.get("https://www.redfin.com/")
7# Fetch a property page to get cookies and property ID
8property_url = "https://www.redfin.com/CA/San-Francisco/123-Main-St-94102/home/12345678"
9page_resp = session.get(property_url)
10# Now hit the Stingray API
11api_url = "https://www.redfin.com/stingray/api/home/details/aboveTheFold?propertyId=12345678"
12api_resp = session.get(api_url, headers={"Referer": property_url})
13# Strip the anti-CSRF prefix
14payload = json.loads(api_resp.text.replace("{}&&", "", 1))
15# Extract structured data
16listing_data = payload.get("payload", {})
17print(json.dumps(listing_data, indent=2))

The response includes typed fields: price as an integer, beds/baths as numbers, photo URLs as arrays, agent info as nested objects. No BeautifulSoup parsing, no CSS selectors, no guessing.

Pros and Limitations of the Hidden API Approach

Pros:

  • Pre-structured JSON — dramatically cleaner than HTML parsing
  • Faster per-request (smaller payloads, no rendering overhead)
  • Lower block risk (API-like requests with proper headers look more natural)

Limitations:

  • Endpoints can change without notice — there's no official documentation
  • robots.txt explicitly disallows /stingray/ for the wildcard user-agent
  • Requires reverse-engineering to discover new endpoints
  • Still needs session warming and proper headers to avoid Cloudflare

The No-Code Alternative: Scrape Redfin with Thunderbit

If you need Redfin data and don't want to maintain Python scripts — or you just want results in five minutes — start here. We built for exactly this: structured data extraction from any website, no code required.

Step 1: Install Thunderbit and Navigate to Redfin

Install the from the Chrome Web Store. Open Redfin and navigate to a search results page — say, San Francisco homes for sale.

Step 2: Click "AI Suggest Fields"

Click the Thunderbit icon in your browser toolbar, then click "AI Suggest Fields." The AI reads the Redfin page and auto-suggests columns like "Address," "Price," "Beds," "Baths," "SqFt," "Property Type," and "Listing Photo" — with correct data types assigned automatically.

You can remove columns you don't need or add custom ones by clicking "+ Add Column" and describing what you want in plain English (e.g., "listing agent name" or "days on market").

You should see a table preview with your configured columns, ready to populate.

Step 3: Click "Scrape" and Watch the Data Roll In

Click the "Scrape" button. Thunderbit processes the visible listings and populates your table. For paginated results, it handles pagination automatically — no loop logic required.

In my testing, a 50-row table fills in about 45 seconds. Structured data, ready to export.

How Thunderbit Handles Redfin's Anti-Bot Protections

Because Thunderbit runs in your own browser, it inherits your existing Redfin cookies, session, and browser fingerprint. To Cloudflare, it looks like a normal user browsing Redfin — because technically, it is. There's no headless browser, no datacenter IP, no mismatched TLS fingerprint. For publicly available pages, Thunderbit's cloud scraping mode can process 50 pages at a time.

That's a fundamentally different posture than firing requests from a Python script on a server.

Your browser session is already trusted.

Scraping Redfin Subpages with Thunderbit

After scraping search results, click "Scrape Subpages" to have the AI visit each property detail URL and enrich your table with additional fields — Redfin Estimate, year built, HOA dues, agent info, property photos, and sale history.

That's the equivalent of the 40-line Selenium loop from Approach 1 — except it takes one click and zero maintenance.

When Redfin changes its DOM from homecardV2Price to span.bp-Homecard__Price--value, the AI adapts. Your Python selectors don't.

Beyond CSV: Export Redfin Data to Google Sheets, Airtable, and Notion

Most tutorials stop at df.to_csv(). That's fine for a one-off analysis. But if you're on a real estate team, you need collaborative, living data — not static files collecting dust on someone's desktop.

Exporting with Python (gspread + Airtable API)

Google Sheets via gspread:

1import gspread
2import pandas as pd
3from gspread_dataframe import set_with_dataframe
4df = pd.DataFrame(listings)
5gc = gspread.service_account(filename="service_account.json")
6sh = gc.open("Redfin Listings")
7ws = sh.worksheet("Sheet1")
8ws.clear()
9set_with_dataframe(ws, df, include_index=False, resize=True)
10# Render property photos inline via IMAGE() formula
11image_col = df.columns.get_loc("image_url") + 1
12for row_idx, url in enumerate(df["image_url"], start=2):
13    ws.update_cell(row_idx, image_col, f'=IMAGE("{url}")')

Heads up: Sheets has a hard limit of 10 million cells per spreadsheet, and the API allows . Use ws.batch_update() instead of per-cell loops for anything over a few dozen rows.

Airtable via pyairtable:

Critical 2024 change: Airtable . You must use Personal Access Tokens (PATs) now — any tutorial still showing api_key=... is broken.

1from pyairtable import Api
2api = Api("patXXXXXXXXXXXXXX.yyyyyyyyyyyyyyyyyyyy")
3table = api.table("appBaseId123", "Redfin Listings")
4records = [
5    {
6        "Address": row["address"],
7        "Price": row["price"],
8        "Beds": row["beds"],
9        "Photo": [{"url": row["image_url"]}],  # Airtable fetches & re-hosts
10    }
11    for row in listings
12]
13created = table.batch_create(records, typecast=True)

Airtable's rate limit is , with a 30-second lockout on violation. The attachment field accepts [{"url": ...}] payloads — Airtable's servers fetch the URL, re-host it on their CDN, and generate thumbnails automatically.

Exporting with Thunderbit (1-Click to Sheets, Airtable, Notion)

Thunderbit has native 1-click export to Google Sheets, Airtable, and Notion — and here's the part I'm genuinely proud of: property photos are uploaded and rendered as inline images in Notion and Airtable. No =IMAGE() formula hacks, no broken CDN links. You click "Export to Airtable," and your team gets a visual property database with thumbnails they can browse on their phones.

For real estate teams doing visual listing triage, this is the difference between a useful tool and a pile of CSV rows.

I'm not a lawyer, and this isn't legal advice. But after years in the data extraction space, I can tell you: "is it legal?" is the question everyone asks and most tutorials dodge.

Redfin's robots.txt

Redfin's is detailed. Key points:

  • Fully blocked bots: peer39_crawler/1.0, AmazonAdBot, FireCrawlAgent — Redfin is specifically naming the popular LLM-era scraping service
  • Wildcard User-agent: * Disallow highlights: /stingray/ (the entire internal API namespace), /myredfin/, /api/v1/rentals/, /api/v1/properties/, /owner-estimate/
  • No Crawl-delay: directive for any user agent
  • 50+ sitemaps declared — sitemaps are the cleanest, WAF-light way to enumerate URLs

Redfin's Terms of Use

states: "You may not automatedly crawl or query the Services for any purpose or by any means... unless you have received prior express written permission."

This is a browsewrap agreement — acceptance by continued use, not a clickwrap. US courts have historically been skeptical of enforcing browsewrap against users who had no actual notice (see Nguyen v. Barnes & Noble, 9th Cir. 2014).

Relevant Case Law (Brief)

  • Van Buren v. United States (Supreme Court, 2021): The CFAA's "exceeds authorized access" clause uses a "gates-up-or-down" test. Using an open door for an unwelcome purpose is not federal hacking.
  • hiQ Labs v. LinkedIn (9th Cir., 2022): Scraping publicly available data is not a CFAA violation. But hiQ ultimately paid $500,000 in a settlement on breach-of-contract grounds — because hiQ had registered LinkedIn accounts and clicked "I agree."
  • Meta Platforms v. Bright Data (N.D. Cal., Jan. 2024): Court granted summary judgment for Bright Data — logged-off scraping of public data did not make Bright Data a "user" bound by Meta's ToS.
  • X Corp. v. Bright Data (N.D. Cal., May 2024): Judge Alsup dismissed X's claims, holding that state-law claims trying to control copying of public content were preempted by the Copyright Act.

Practical Guidance

  • Scrape only publicly accessible data — never register an account and then scrape (that creates clickwrap contract exposure)
  • Respect rate limits — aggressive volumes support trespass-to-chattels claims
  • Do not republish raw data or photos at scale — the lawsuit (filed July 2025, potential damages exceeding $1 billion) is a reminder that photo copyright is serious
  • Thunderbit's browser-based approach — running in your own authenticated session — is closer to "manual browsing at machine speed" than a headless datacenter bot, which is the most defensible posture short of a licensed API

Tips and Common Pitfalls

A few hard-won lessons from building extraction tools and watching thousands of users scrape real estate sites:

  • Always warm your session. Hit redfin.com/ before any deep URL. Cold deep-URL hits are the #1 trigger for Cloudflare challenges.
  • Rotate User-Agent strings realistically. Don't just use one — rotate through 5–10 current Chrome/Firefox UAs. But don't rotate too aggressively (different UA every request looks suspicious).
  • Deduplicate by property ID. Redfin's pagination sometimes overlaps. Parse the /home/{id} from each listing URL and deduplicate before enriching.
  • Don't scrape during peak hours if you can avoid it. Late night / early morning US time sees less WAF scrutiny in my experience.
  • If you get a 429, back off exponentially. Don't retry immediately — that's how you escalate from a soft rate-limit to a hard IP ban.
  • For large-scale projects (1,000+ pages), budget for residential proxies. Datacenter IPs (AWS, GCP, Azure, OVH) are blacklisted by Cloudflare's ASN reputation system. You'll hit Error 1020 almost immediately.

Picking the Right Way to Scrape Redfin

So which approach should you pick? It depends on who you are and what you need.

HTML Parsing (BeautifulSoup + Selenium): Best for developers who want full control, are comfortable maintaining CSS selectors, and don't mind rebuilding when Redfin changes its DOM. Expect to revisit your code every 6–12 months.

Hidden Stingray API: Best for developers who need clean, structured JSON and can handle reverse-engineering undocumented endpoints. Lower maintenance than HTML parsing, but endpoints can change without notice. Remember that /stingray/ is explicitly disallowed in robots.txt.

Thunderbit (No-Code): Best for non-developers, quick projects, and teams that need ongoing Redfin data without developer resources. AI adapts to layout changes, subpage scraping enriches data with one click, and export to , Airtable, or Notion is built in. If you're a real estate team that needs a living property database — not a one-time CSV dump — this is the path of least resistance.

Whichever path you take: understand Redfin's anti-bot defenses before you start, know what fields you need, pick an export format that fits your team's workflow, and stay on the right side of .

Ready to try the no-code path? lets you experiment with Redfin scraping and see results in minutes. For the Python approaches, the code snippets above are a working starting point — just add proxies and patience.

FAQs

Does Redfin have a public API?

No. Redfin does not offer an official public API. The hidden Stingray API (/stingray/api/home/details/*) returns structured JSON and is used by Redfin's own frontend, but it's unofficial, undocumented, subject to change without notice, and explicitly disallowed in Redfin's robots.txt. Open-source wrappers like on PyPI provide Python access, but use them understanding the risks.

Can I scrape Redfin without Python?

Yes. is an AI Chrome extension that inherits your browser session for anti-bot resilience — install it, navigate to Redfin, click "AI Suggest Fields," and export to Excel, Google Sheets, Airtable, or Notion. There are also other no-code scraping tools and prebuilt dataset providers in the market if you want to explore alternatives.

How often does Redfin change its website layout?

Community GitHub issue history shows CSS-selector breakage roughly every 6–12 months. Redfin has shipped two card DOM generations — legacy (homecardV2Price, homeAddressV2) and current (bp-Homecard__Price--value, bp-Homecard__Address). Mature scrapers try both in sequence.

AI-based tools like Thunderbit because they detect fields by content rather than CSS selectors.

What's the best proxy type for scraping Redfin?

US residential proxies for large-scale scraping — community benchmarks put the success rate around 80%. Datacenter proxies hit Cloudflare Error 1020 almost immediately; AWS, GCP, Azure, and OVH IP ranges are blacklisted. Mobile proxies have the highest success rate but cost 5–10x more.

For small-scale personal scraping (<100 pages), proper headers + curl_cffi impersonation + 2–5 second delays may work without proxies at all.

Can I scrape sold or off-market property data from Redfin?

Yes. Sold property data and the off-market Redfin Estimate (median error ) are available on detail pages using the same scraping approaches. The fields differ from active listings: off-market pages expose sold price, sold date, property history, and the owner-estimate endpoint, but lack current list price, days on market, and open house info. The Stingray API endpoint for off-market estimates is api/home/details/owner-estimate rather than api/home/details/avm.

Try Thunderbit for Redfin scraping

Learn More

Shuai Guan
Shuai Guan
Co-founder/CEO @ Thunderbit. Passionate about cross section of AI and Automation. He's a big advocate of automation and loves making it more accessible to everyone. Beyond tech, he channels his creativity through a passion for photography, capturing stories one picture at a time.
Table of Contents

Try Thunderbit

Scrape leads & other data in just 2-clicks. Powered by AI.

Get Thunderbit It's free
Extract Data using AI
Easily transfer data to Google Sheets, Airtable, or Notion
Chrome Store Rating
PRODUCT HUNT#1 Product of the Week