Get Started
Production-grade web extraction infrastructure for AI applications
Thunderbit Open API turns any web page into clean, structured data your LLMs can actually use — while transparently handling JavaScript rendering, anti-bot protection, geo-routing, and proxy rotation.
Quickstart
Five-minute walk-through. cURL, Python, and Node.js samples.
API Reference
Endpoints, error codes, retry strategy.
Why Thunderbit
| Pain point | Without Thunderbit | With Thunderbit |
|---|---|---|
| JavaScript-heavy SPAs | Self-host headless Chrome, debug timeouts, watch memory leak | renderMode: "full" |
| CAPTCHA / bot walls | Rotate proxies, solve puzzles, watch IPs burn | We absorb it |
| Geo-blocked content | Manage proxy pools per country | countryCode: "DE" |
| HTML noise (ads, nav, popups) | Hand-write per-site readability heuristics | Auto-stripped Markdown |
| Structured extraction | Train extractors, maintain CSS selectors that break weekly | JSON Schema → JSON output |
| Scaling to 10k+ URLs | Build your own queue, retry, dedupe, status board | Batch endpoint + webhook |
| LLM token costs | Feed the model raw HTML and pay for it | Pre-distilled Markdown — 5–10× fewer tokens |
Three core endpoints
🔥 Distill — page → clean Markdown
curl -X POST https://openapi.thunderbit.com/openapi/v1/distill \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article"}'Returns LLM-ready Markdown with metadata stripped. 5–10× fewer tokens than raw HTML.
🧠 Extract — JSON Schema → structured fields
curl -X POST https://openapi.thunderbit.com/openapi/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" }
},
"required": ["name", "price"]
}
}'The AI reads your schema's descriptions — be specific ("product MSRP in USD before discount" beats "price").
⚡ Batch — up to 100 URLs, async with webhooks
curl -X POST https://openapi.thunderbit.com/openapi/v1/batch/distill \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/page1", "https://example.com/page2"],
"webhook": {
"url": "https://your-server.com/webhook/distill",
"secret": "whsec_your_secret_key"
}
}'Submit → fire webhook → fetch results. See Batch Lifecycle.
Resources
Guides
Render modes, schema design, webhooks, batch lifecycle, rate limits, credits.
Recipes
RAG knowledge base, price monitoring, news aggregation, agent tooling.
SDKs
Python, Node.js, Go, Java, Kotlin, Swift, Elixir, Dart, Bash, and more.
Integrations
LangChain, Vercel AI SDK, MCP, n8n, Zapier, and more.
API Reference
Endpoints, error codes, retry strategy.
Models
Shared response and error shapes.