What's the difference between Distill and Extract?

Distill converts any URL into clean Markdown, stripping ads, navigation, and noise. Extract takes a URL plus a JSON Schema and returns structured JSON or CSV data. Use Distill for content ingestion (RAG, knowledge bases) and Extract for structured data collection (prices, listings, contacts).

Does it work with JavaScript-heavy sites?

Yes. Thunderbit's API includes full JavaScript rendering and anti-bot bypass built in. It handles SPAs, dynamic content, and pages that require JS execution to load data.

Will extraction break when a site redesigns?

No. Thunderbit reads meaning, not DOM structure. Traditional scrapers rely on CSS selectors and XPath that break on every redesign. Thunderbit's AI understands the semantic content of the page, so extraction keeps working even when the HTML changes underneath.

What is the confidence score?

The confidence score indicates how certain Thunderbit's AI is about the extracted data. It helps you programmatically decide whether to trust a result or flag it for review.

How long do batch jobs take?

Batch processing times depend on the number of URLs and complexity. Distill supports up to 100 URLs per request and Extract supports up to 50 URLs per request. Most batch jobs complete within minutes.

AI-Powered Web Scraper API

Zero Maintenance. Ever.

One API call to turn any webpage into Markdown or tables. Fuel your agent with live web data, build RAG, and enrich databases — we handle the infrastructure.

Get Free API Key Read Docs

Chrome Store Rating

G2 Rating

Capterra Rating

Software Advice Rating

GetApp Rating

PRODUCT HUNT#1 Product of the Week

Users Worldwide200K+

Up and running in minutes

Try it in your terminal right now.

URL to Markdown

1import requests

3resp = requests.post(

4 "https://openapi.thunderbit.com/openapi/v1/distill",

5 headers={"Authorization": f"Bearer {API_KEY}"},

6 json={"url": "https://example.com/article"}

9markdown = resp.json()["data"]["markdown"]

Core API

Two core capabilities

Distill for clean content, Extract for structured data

Distill

URL→Markdown

Strips ads, nav, and noise — keeps only the content that matters

Full JS rendering and anti-bot bypass built in

Batch up to 100 URLs per request

Extract

URL + Schema→JSON / CSV

One schema works across all websites — no per-site maintenance

Survives site redesigns automatically

Batch up to 50 URLs per request

Advantages

Why use Thunderbit

The scraping / data extraction infrastructure your AI agent deserves

Define what, not how

No CSS selectors, no XPath, no per-site rules. Describe the data you need with a JSON Schema — AI figures out where it lives and how to get it.

One schema, every website

The same schema works across E-commerce sites, Sales Listings or any URL you throw at it. Adding a new data source is a config change, not an engineering sprint.

Stays working when sites break

Traditional scrapers die on every redesign. Thunderbit reads meaning, not DOM structure — so extraction keeps working even when the HTML changes underneath.

Industries

Use cases

What you can build with Thunderbit

AI Agents with Web Access

Give your agent the ability to read and understand any webpage. One API call returns structured context, ready for your agent's next step.

RAG & Knowledge Bases

Distill any URL into clean Markdown and feed it straight into your vector database. No HTML parsing, no content cleaning scripts.

Turn Any Website into an API

Define a schema, point at a URL, get JSON back. Build a product price API, a job listing API, or a news feed API — without writing a single scraper.

Database Enrichment

Keep your database fresh with live web data. Pull company profiles, contact info, or listing details on a schedule — schema stays the same even when sources change.

Competitive Monitoring

Track prices, inventory, reviews, or content changes across hundreds of pages. Same schema, same pipeline, add new sources in seconds.

Dataset Building

Build training sets, evaluation benchmarks, or research datasets from the open web. Batch process thousands of URLs into consistently structured output.

We build Thunderbit on this API

The same API you're looking at powers Thunderbit's Chrome Extension and web app — used by 200,000+ users to extract tens of millions of pages every month. This isn't a side project. It's the infrastructure we bet our own product on.

0M+

Pages processed monthly and growing

0K+

Users on Thunderbit Extension

Uptime

Plan