Introduction

Production-grade web extraction infrastructure for LLM and AI apps. Markdown distillation, JSON extraction, and async batch jobs from one Open API.

Thunderbit Open API turns any web page into clean, structured data your LLMs can actually use — while transparently handling JavaScript rendering, anti-bot protection, geo-routing, and proxy rotation.

Quickstart

Five-minute walk-through. cURL, Python, and Node.js samples.

API Reference

Endpoints, error codes, retry strategy.

Why Thunderbit

Pain point	Without Thunderbit	With Thunderbit
JavaScript-heavy SPAs	Self-host headless Chrome, debug timeouts, watch memory leak	`renderMode: "full"`
CAPTCHA / bot walls	Rotate proxies, solve puzzles, watch IPs burn	We absorb it
Geo-blocked content	Manage proxy pools per country	`countryCode: "DE"`
HTML noise (ads, nav, popups)	Hand-write per-site readability heuristics	Auto-stripped Markdown
Structured extraction	Train extractors, maintain CSS selectors that break weekly	JSON Schema → JSON output
Scaling to 10k+ URLs	Build your own queue, retry, dedupe, status board	Batch endpoint + webhook
LLM token costs	Feed the model raw HTML and pay for it	Pre-distilled Markdown — 5–10× fewer tokens

Three core endpoints

🔥 Distill — page → clean Markdown

curl -X POST https://openapi.thunderbit.com/openapi/v1/distill \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

Returns LLM-ready Markdown with metadata stripped. 5–10× fewer tokens than raw HTML.

🧠 Extract — JSON Schema → structured fields

curl -X POST https://openapi.thunderbit.com/openapi/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product",
    "schema": {
      "type": "object",
      "properties": {
        "name":  { "type": "string" },
        "price": { "type": "number" }
      },
      "required": ["name", "price"]
    }
  }'

The AI reads your schema's descriptions — be specific ("product MSRP in USD before discount" beats "price").

⚡ Batch — up to 100 URLs, async with webhooks

curl -X POST https://openapi.thunderbit.com/openapi/v1/batch/distill \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/page1", "https://example.com/page2"],
    "webhook": {
      "url":    "https://your-server.com/webhook/distill",
      "secret": "whsec_your_secret_key"
    }
  }'