Skills + CLI
Run Thunderbit from the terminal — distill pages into Markdown, extract structured data, suggest fields, and batch-process URLs in bulk (max 100 distill / 50 extract). The CLI works standalone or as a skills toolkit AI coding agents can discover.
Distill, extract, suggest fields, and run batch jobs directly from the terminal.
Installation
The CLI publishes to npm as @thunderbit/thunderbit-cli and exposes a thunderbit binary on your PATH.
# Install globally
npm install -g @thunderbit/thunderbit-cli
# Or run one-shot via npx
npx -y @thunderbit/thunderbit-cli --helpA Python (
pip install thunderbit) flavour with the same command surface is on the roadmap.
Authentication
Before using the CLI, you need to authenticate with your Thunderbit API key. Get a key at Thunderbit Dashboard. Format: tb_ followed by 32 hex chars.
Set via environment variable
export THUNDERBIT_API_KEY=tb_YOUR_API_KEYPass per command
thunderbit --api-key tb_YOUR_API_KEY distill https://example.comSelf-Hosted / Local Development
For self-hosted Thunderbit gateways, override the base URL:
# Per call
thunderbit --base-url https://api.your-domain.com distill https://example.com
# Or set via environment variable
export THUNDERBIT_API_BASE_URL=https://api.your-domain.com
thunderbit distill https://example.comCheck version
thunderbit --version
# or
thunderbit -VGlobal Options
These flags are available for every command:
| Option | Description |
|---|---|
--api-key <key>, -k | API key (or set THUNDERBIT_API_KEY) |
--base-url <url> | API base URL (or set THUNDERBIT_API_BASE_URL) |
--format <format>, -f | Output format: json, table, or markdown (default json) |
--version, -V | Print CLI version |
--help, -h | Show command help |
Commands
Distill
Distill a single URL into clean, LLM-ready Markdown.
# Basic usage
thunderbit distill https://example.com/article
# Stream Markdown to stdout
thunderbit distill https://example.com --format markdown
# Save to file
thunderbit distill https://example.com --format markdown > article.mdDistill Options
# Use the basic JS renderer (covers most modern sites)
thunderbit distill https://example.com --render-mode basic
# Use the full headless browser (slowest, highest fidelity)
thunderbit distill https://example.com --render-mode full
# Geo-target for region-aware sites
thunderbit distill https://example.com --country-code DE
# Bump per-page timeout
thunderbit distill https://example.com --timeout 60000
# Use sync /distill instead of the default async submit + poll
thunderbit distill https://example.com --syncAvailable Options:
| Option | Default | Description |
|---|---|---|
--render-mode <mode> | none | none, basic, or full |
--timeout <ms> | 30000 | Per-page request timeout in ms |
--country-code <CC> | US | ISO 2-letter code, uppercase |
--sync | false | Use sync mode instead of async submit + poll |
Extract
Extract structured data from a page. The schema is a flat map of fieldName → natural-language instruction — each value is a hint the AI uses to find the field on the page.
Note: the upstream OpenAPI spec example shows JSON Schema (
{type:"object",properties:…}). At time of writing the live server expects the flat instruction map shown below; we're aligning the spec.
# Inline schema — flat map of field → instruction
thunderbit extract https://example.com/product \
--schema '{"name":"product name","price":"the listed price as a number","currency":"3-letter currency code"}'
# Schema from file
thunderbit extract https://example.com/product --schema ./schema.json
# Save the extracted JSON
thunderbit extract https://example.com/product --schema ./schema.json --format json -o data.jsonThe response always returns data.data as an array, one element per page region matching your schema:
{
"success": true,
"data": {
"url": "https://example.com/product",
"data": [
{ "name": "iPhone 15 Pro", "price": 999, "currency": "USD" }
]
}
}Interactive mode
# AI proposes fields, you toggle/edit, then extraction runs with the curated schema
thunderbit extract https://example.com/product --interactive
# Steer the suggestion with a prompt
thunderbit extract https://example.com/product -i --prompt "focus on pricing and availability"
# Persist the schema for reuse
thunderbit extract https://example.com/product -i --save-schema ./product-schema.jsonExtract Options
# Bump page-render time for SPAs
thunderbit extract https://example.com --schema ./schema.json --render-mode full
# Sync mode
thunderbit extract https://example.com --schema ./schema.json --sync
# Longer timeout for complex pages
thunderbit extract https://example.com --schema ./schema.json --timeout 120000Available Options:
| Option | Default | Description |
|---|---|---|
--schema <json-or-file> | — | Inline JSON or path to schema file |
--interactive, -i | false | Suggest → curate → extract in one go |
--prompt <text> | — | Hint for AI suggestion (with -i) |
--render-mode <mode> | none | none, basic, or full |
--timeout <ms> | 60000 | Per-page request timeout in ms |
--sync | false | Sync mode |
--save-schema <file> | — | Persist the final schema for reuse |
Suggest Fields
Let the AI propose extractable fields before you write a schema.
# Basic
thunderbit suggest-fields https://example.com/product
# Steer with a prompt
thunderbit suggest-fields https://example.com/listings --prompt "extract job postings only"
# Region-aware
thunderbit suggest-fields https://example.com --country-code DEThe interactive editor lets you toggle fields by number (1 3 5), add, rm 2, edit 4, then done to confirm. suggest-fields returns [{name, type, instruction}, …]; when you feed that into extract, transform it into a flat map first:
thunderbit suggest-fields "$URL" --format json \
| jq 'map({(.name): .instruction}) | add' > schema.json
thunderbit extract "$URL" --schema ./schema.jsonAvailable Options:
| Option | Default | Description |
|---|---|---|
--prompt <text> | — | Steering hint |
--country-code <CC> | US | ISO 2-letter code |
Batch Distill
Submit up to 100 URLs in a single batch job. Defaults to submit + poll until COMPLETED / FAILED / CANCELLED.
# URLs as positional args
thunderbit batch distill https://a.com https://b.com https://c.com
# Or read URLs from a file (one per line)
thunderbit batch distill --file urls.txt
# Submit only — print the job ID and exit (use webhook or poll later)
thunderbit batch distill --file urls.txt --no-pollBatch Distill Options
# Bump per-page timeout
thunderbit batch distill --file urls.txt --timeout 60000
# Pipe results into another tool
thunderbit batch distill --file urls.txt --format json \
| jq -r '.data.results[] | select(.success == true) | .markdown' \
> distilled.mdAvailable Options:
| Option | Default | Description |
|---|---|---|
--file <path> | — | Read URLs from file (one per line) |
--timeout <ms> | 30000 | Per-page request timeout in ms |
--no-poll | false | Submit only, print job ID, exit |
Cancel a running batch distill job
thunderbit batch cancel-distill <jobId>Already-completed pages keep their results. Pending pages are dropped and you stop being charged for them. Status flips to CANCELLED once the server acks.
Batch Extract
Submit up to 50 URLs with a shared schema.
# URLs as positional args + inline schema
thunderbit batch extract https://a.com https://b.com \
--schema ./schema.json
# Read URLs from a file
thunderbit batch extract --file urls.txt --schema ./schema.json
# Submit only
thunderbit batch extract --file urls.txt --schema ./schema.json --no-pollAvailable Options:
| Option | Default | Description |
|---|---|---|
--file <path> | — | Read URLs from file (one per line) |
--schema <json-or-file> | — | Inline JSON or schema file (required) |
--timeout <ms> | 60000 | Per-page request timeout in ms |
--no-poll | false | Submit only, print job ID, exit |
Cancel a running batch extract job
thunderbit batch cancel-extract <jobId>Same semantics as cancel-distill — completed rows are preserved, pending rows are dropped, billing stops for the remainder.
Output Handling
The CLI writes to stdout by default, making it easy to pipe or redirect.
# Pipe Markdown into another tool
thunderbit distill https://example.com --format markdown | head -50
# Redirect to a file
thunderbit distill https://example.com --format markdown > output.md
# Save extraction JSON
thunderbit extract https://example.com --schema ./schema.json --format json > data.jsonFormat Behaviour
--format json(default): full API response as compact JSON, includingsuccess,data,creditsUsed, etc. Pipe intojq.--format markdown: raw Markdown body fordistill; full JSON for other commands.--format table: ASCII table for tabular results (extract, suggest-fields).
# Markdown body straight to disk
thunderbit distill https://example.com --format markdown
# Full structured response
thunderbit distill https://example.com --format jsonExamples
Quick Distill
# Distill an article
thunderbit distill https://example.com/article --format markdown
# Save HTML-converted Markdown to disk
thunderbit distill https://example.com --format markdown -o page.mdBulk RAG ingestion
# Distill a docs site listed in urls.txt and write each page to disk
thunderbit batch distill --file urls.txt --format json \
| jq -r '.data.results[] | select(.success == true) | "\(.url)\t\(.markdown)"' \
> corpus.tsvDiscover then extract
# Step 1: AI proposes fields, you curate, schema saved to disk
thunderbit extract https://example.com/product -i --save-schema ./schema.json
# Step 2: re-use across the catalog
thunderbit batch extract --file urls.txt --schema ./schema.json --format json > products.jsonCI gate — fail when extraction returns nothing
thunderbit extract "$URL" --schema ./schema.json --format json \
| jq -e '.data | length > 0'Combine with other tools
# Extract URLs from a search-result JSON
thunderbit distill https://example.com --format json \
| jq -r '.data.metadata.canonicalUrl'
# Pipe distilled content into a model for summarisation
thunderbit distill "$URL" --format markdown \
| claude -p "summarise the article in 5 bullets"
# Count successful pages in a batch
thunderbit batch distill --file urls.txt --format json \
| jq '[.data.results[] | select(.success == true)] | length'Exit Codes
| Code | Meaning |
|---|---|
0 | Success. Result is on stdout in the format chosen by --format. |
1 | Any failure — missing API key, auth error, HTTP 4xx/5xx, network error, missing schema file, missing required argument. |
All error text is written to stderr. On failure, stdout stays empty (yes, even with --format json). That means a jq pipeline never receives a half-baked envelope — check the exit code (or set -e) before parsing.
Polling progress (e.g.
Processing... (3)from async submit + poll) is also written tostderr. Pipe2>/dev/nullto mute it. Single-page sync calls (--sync) don't emit progress.
Troubleshooting
Error: API key is required. Export THUNDERBIT_API_KEY or pass --api-key.
Network errors behind a corporate proxy. Set HTTPS_PROXY and HTTP_PROXY — both Node and Python clients honour them.
Slow batch polling. Bump --timeout for the per-page budget. The polling cadence itself is fixed at a few seconds and not configurable from the CLI today.
Open Source
The Thunderbit CLI is MIT-licensed and open source on GitHub: GitHub repository (the same repo also ships the MCP server and the Claude Code plugin). Distributed on npm as @thunderbit/thunderbit-cli.
Related
- MCP Server — same operations, exposed as MCP tools
- SDKs — call from your own code
- API Reference — raw HTTP