Get Started

Skills + CLI

Run Thunderbit from the terminal — distill pages into Markdown, extract structured data, suggest fields, and batch-process URLs in bulk (max 100 distill / 50 extract). The CLI works standalone or as a skills toolkit AI coding agents can discover.

Distill, extract, suggest fields, and run batch jobs directly from the terminal.

Installation

The CLI publishes to npm as @thunderbit/thunderbit-cli and exposes a thunderbit binary on your PATH.

# Install globally
npm install -g @thunderbit/thunderbit-cli

# Or run one-shot via npx
npx -y @thunderbit/thunderbit-cli --help

A Python (pip install thunderbit) flavour with the same command surface is on the roadmap.

Authentication

Before using the CLI, you need to authenticate with your Thunderbit API key. Get a key at Thunderbit Dashboard. Format: tb_ followed by 32 hex chars.

Set via environment variable

export THUNDERBIT_API_KEY=tb_YOUR_API_KEY

Pass per command

thunderbit --api-key tb_YOUR_API_KEY distill https://example.com

Self-Hosted / Local Development

For self-hosted Thunderbit gateways, override the base URL:

# Per call
thunderbit --base-url https://api.your-domain.com distill https://example.com

# Or set via environment variable
export THUNDERBIT_API_BASE_URL=https://api.your-domain.com
thunderbit distill https://example.com

Check version

thunderbit --version
# or
thunderbit -V

Global Options

These flags are available for every command:

OptionDescription
--api-key <key>, -kAPI key (or set THUNDERBIT_API_KEY)
--base-url <url>API base URL (or set THUNDERBIT_API_BASE_URL)
--format <format>, -fOutput format: json, table, or markdown (default json)
--version, -VPrint CLI version
--help, -hShow command help

Commands

Distill

Distill a single URL into clean, LLM-ready Markdown.

# Basic usage
thunderbit distill https://example.com/article

# Stream Markdown to stdout
thunderbit distill https://example.com --format markdown

# Save to file
thunderbit distill https://example.com --format markdown > article.md

Distill Options

# Use the basic JS renderer (covers most modern sites)
thunderbit distill https://example.com --render-mode basic

# Use the full headless browser (slowest, highest fidelity)
thunderbit distill https://example.com --render-mode full

# Geo-target for region-aware sites
thunderbit distill https://example.com --country-code DE

# Bump per-page timeout
thunderbit distill https://example.com --timeout 60000

# Use sync /distill instead of the default async submit + poll
thunderbit distill https://example.com --sync

Available Options:

OptionDefaultDescription
--render-mode <mode>nonenone, basic, or full
--timeout <ms>30000Per-page request timeout in ms
--country-code <CC>USISO 2-letter code, uppercase
--syncfalseUse sync mode instead of async submit + poll

Extract

Extract structured data from a page. The schema is a flat map of fieldName → natural-language instruction — each value is a hint the AI uses to find the field on the page.

Note: the upstream OpenAPI spec example shows JSON Schema ({type:"object",properties:…}). At time of writing the live server expects the flat instruction map shown below; we're aligning the spec.

# Inline schema — flat map of field → instruction
thunderbit extract https://example.com/product \
  --schema '{"name":"product name","price":"the listed price as a number","currency":"3-letter currency code"}'

# Schema from file
thunderbit extract https://example.com/product --schema ./schema.json

# Save the extracted JSON
thunderbit extract https://example.com/product --schema ./schema.json --format json -o data.json

The response always returns data.data as an array, one element per page region matching your schema:

{
  "success": true,
  "data": {
    "url": "https://example.com/product",
    "data": [
      { "name": "iPhone 15 Pro", "price": 999, "currency": "USD" }
    ]
  }
}

Interactive mode

# AI proposes fields, you toggle/edit, then extraction runs with the curated schema
thunderbit extract https://example.com/product --interactive

# Steer the suggestion with a prompt
thunderbit extract https://example.com/product -i --prompt "focus on pricing and availability"

# Persist the schema for reuse
thunderbit extract https://example.com/product -i --save-schema ./product-schema.json

Extract Options

# Bump page-render time for SPAs
thunderbit extract https://example.com --schema ./schema.json --render-mode full

# Sync mode
thunderbit extract https://example.com --schema ./schema.json --sync

# Longer timeout for complex pages
thunderbit extract https://example.com --schema ./schema.json --timeout 120000

Available Options:

OptionDefaultDescription
--schema <json-or-file>Inline JSON or path to schema file
--interactive, -ifalseSuggest → curate → extract in one go
--prompt <text>Hint for AI suggestion (with -i)
--render-mode <mode>nonenone, basic, or full
--timeout <ms>60000Per-page request timeout in ms
--syncfalseSync mode
--save-schema <file>Persist the final schema for reuse

Suggest Fields

Let the AI propose extractable fields before you write a schema.

# Basic
thunderbit suggest-fields https://example.com/product

# Steer with a prompt
thunderbit suggest-fields https://example.com/listings --prompt "extract job postings only"

# Region-aware
thunderbit suggest-fields https://example.com --country-code DE

The interactive editor lets you toggle fields by number (1 3 5), add, rm 2, edit 4, then done to confirm. suggest-fields returns [{name, type, instruction}, …]; when you feed that into extract, transform it into a flat map first:

thunderbit suggest-fields "$URL" --format json \
  | jq 'map({(.name): .instruction}) | add' > schema.json

thunderbit extract "$URL" --schema ./schema.json

Available Options:

OptionDefaultDescription
--prompt <text>Steering hint
--country-code <CC>USISO 2-letter code

Batch Distill

Submit up to 100 URLs in a single batch job. Defaults to submit + poll until COMPLETED / FAILED / CANCELLED.

# URLs as positional args
thunderbit batch distill https://a.com https://b.com https://c.com

# Or read URLs from a file (one per line)
thunderbit batch distill --file urls.txt

# Submit only — print the job ID and exit (use webhook or poll later)
thunderbit batch distill --file urls.txt --no-poll

Batch Distill Options

# Bump per-page timeout
thunderbit batch distill --file urls.txt --timeout 60000

# Pipe results into another tool
thunderbit batch distill --file urls.txt --format json \
  | jq -r '.data.results[] | select(.success == true) | .markdown' \
  > distilled.md

Available Options:

OptionDefaultDescription
--file <path>Read URLs from file (one per line)
--timeout <ms>30000Per-page request timeout in ms
--no-pollfalseSubmit only, print job ID, exit

Cancel a running batch distill job

thunderbit batch cancel-distill <jobId>

Already-completed pages keep their results. Pending pages are dropped and you stop being charged for them. Status flips to CANCELLED once the server acks.


Batch Extract

Submit up to 50 URLs with a shared schema.

# URLs as positional args + inline schema
thunderbit batch extract https://a.com https://b.com \
  --schema ./schema.json

# Read URLs from a file
thunderbit batch extract --file urls.txt --schema ./schema.json

# Submit only
thunderbit batch extract --file urls.txt --schema ./schema.json --no-poll

Available Options:

OptionDefaultDescription
--file <path>Read URLs from file (one per line)
--schema <json-or-file>Inline JSON or schema file (required)
--timeout <ms>60000Per-page request timeout in ms
--no-pollfalseSubmit only, print job ID, exit

Cancel a running batch extract job

thunderbit batch cancel-extract <jobId>

Same semantics as cancel-distill — completed rows are preserved, pending rows are dropped, billing stops for the remainder.

Output Handling

The CLI writes to stdout by default, making it easy to pipe or redirect.

# Pipe Markdown into another tool
thunderbit distill https://example.com --format markdown | head -50

# Redirect to a file
thunderbit distill https://example.com --format markdown > output.md

# Save extraction JSON
thunderbit extract https://example.com --schema ./schema.json --format json > data.json

Format Behaviour

  • --format json (default): full API response as compact JSON, including success, data, creditsUsed, etc. Pipe into jq.
  • --format markdown: raw Markdown body for distill; full JSON for other commands.
  • --format table: ASCII table for tabular results (extract, suggest-fields).
# Markdown body straight to disk
thunderbit distill https://example.com --format markdown

# Full structured response
thunderbit distill https://example.com --format json

Examples

Quick Distill

# Distill an article
thunderbit distill https://example.com/article --format markdown

# Save HTML-converted Markdown to disk
thunderbit distill https://example.com --format markdown -o page.md

Bulk RAG ingestion

# Distill a docs site listed in urls.txt and write each page to disk
thunderbit batch distill --file urls.txt --format json \
  | jq -r '.data.results[] | select(.success == true) | "\(.url)\t\(.markdown)"' \
  > corpus.tsv

Discover then extract

# Step 1: AI proposes fields, you curate, schema saved to disk
thunderbit extract https://example.com/product -i --save-schema ./schema.json

# Step 2: re-use across the catalog
thunderbit batch extract --file urls.txt --schema ./schema.json --format json > products.json

CI gate — fail when extraction returns nothing

thunderbit extract "$URL" --schema ./schema.json --format json \
  | jq -e '.data | length > 0'

Combine with other tools

# Extract URLs from a search-result JSON
thunderbit distill https://example.com --format json \
  | jq -r '.data.metadata.canonicalUrl'

# Pipe distilled content into a model for summarisation
thunderbit distill "$URL" --format markdown \
  | claude -p "summarise the article in 5 bullets"

# Count successful pages in a batch
thunderbit batch distill --file urls.txt --format json \
  | jq '[.data.results[] | select(.success == true)] | length'

Exit Codes

CodeMeaning
0Success. Result is on stdout in the format chosen by --format.
1Any failure — missing API key, auth error, HTTP 4xx/5xx, network error, missing schema file, missing required argument.

All error text is written to stderr. On failure, stdout stays empty (yes, even with --format json). That means a jq pipeline never receives a half-baked envelope — check the exit code (or set -e) before parsing.

Polling progress (e.g. Processing... (3) from async submit + poll) is also written to stderr. Pipe 2>/dev/null to mute it. Single-page sync calls (--sync) don't emit progress.

Troubleshooting

Error: API key is required. Export THUNDERBIT_API_KEY or pass --api-key.

Network errors behind a corporate proxy. Set HTTPS_PROXY and HTTP_PROXY — both Node and Python clients honour them.

Slow batch polling. Bump --timeout for the per-page budget. The polling cadence itself is fixed at a few seconds and not configurable from the CLI today.

Open Source

The Thunderbit CLI is MIT-licensed and open source on GitHub: GitHub repository (the same repo also ships the MCP server and the Claude Code plugin). Distributed on npm as @thunderbit/thunderbit-cli.