Get Started

Introduction | Thunderbit API

Production-grade web extraction infrastructure for LLM and AI apps. Markdown distillation, JSON extraction, and async batch jobs from one Open API.

Thunderbit Open API turns any web page into clean, structured data your LLMs can actually use — while transparently handling JavaScript rendering, anti-bot protection, geo-routing, and proxy rotation.

Why Thunderbit

Pain pointWithout ThunderbitWith Thunderbit
JavaScript-heavy SPAsSelf-host headless Chrome, debug timeouts, watch memory leakrenderMode: "full"
CAPTCHA / bot wallsRotate proxies, solve puzzles, watch IPs burnWe absorb it
Geo-blocked contentManage proxy pools per countrycountryCode: "DE"
HTML noise (ads, nav, popups)Hand-write per-site readability heuristicsAuto-stripped Markdown
Structured extractionTrain extractors, maintain CSS selectors that break weeklyJSON Schema → JSON output
Scaling to 10k+ URLsBuild your own queue, retry, dedupe, status boardBatch endpoint + webhook
LLM token costsFeed the model raw HTML and pay for itPre-distilled Markdown — 5–10× fewer tokens

Three core endpoints

🔥 Distill — page → clean Markdown

curl -X POST https://openapi.thunderbit.com/openapi/v1/distill \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

Returns LLM-ready Markdown with metadata stripped. 5–10× fewer tokens than raw HTML.

🧠 Extract — JSON Schema → structured fields

curl -X POST https://openapi.thunderbit.com/openapi/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product",
    "schema": {
      "type": "object",
      "properties": {
        "name":  { "type": "string" },
        "price": { "type": "number" }
      },
      "required": ["name", "price"]
    }
  }'

The AI reads your schema's descriptions — be specific ("product MSRP in USD before discount" beats "price").

⚡ Batch — up to 100 URLs, async with webhooks

curl -X POST https://openapi.thunderbit.com/openapi/v1/batch/distill \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/page1", "https://example.com/page2"],
    "webhook": {
      "url":    "https://your-server.com/webhook/distill",
      "secret": "whsec_your_secret_key"
    }
  }'

Submit → fire webhook → fetch results. See Batch Lifecycle.

Resources