Skills + CLI

ターミナルから Thunderbit を実行 — ページを Markdown に蒸留、構造化データを抽出、フィールドを提案、最大 100 URL を一括処理。CLI は単独でも、AI コーディング Agent が見つけられる skills ツールキットとしても使えます。

ターミナルから直接、蒸留・抽出・フィールド提案・バッチジョブを実行します。

インストール

CLI は npm に @thunderbit/thunderbit-cli として公開され、PATH に thunderbit バイナリを配置します。

# Install globally
npm install -g @thunderbit/thunderbit-cli

# Or run one-shot via npx
npx -y @thunderbit/thunderbit-cli --help

同じコマンド面を持つ Python 版（pip install thunderbit）はロードマップにあります。

認証

CLI を使う前に Thunderbit API key で認証してください。Key は app.thunderbit.com/console から取得できます。形式：tb_ の後に 32 桁の 16 進数。

環境変数で設定

export THUNDERBIT_API_KEY=tb_YOUR_API_KEY

コマンドごとに渡す

thunderbit --api-key tb_YOUR_API_KEY distill https://example.com

セルフホスト / ローカル開発

セルフホストの Thunderbit ゲートウェイ用に base URL を上書きします：

# Per call
thunderbit --base-url https://api.your-domain.com distill https://example.com

# Or set via environment variable
export THUNDERBIT_API_BASE_URL=https://api.your-domain.com
thunderbit distill https://example.com

バージョン確認

thunderbit --version
# or
thunderbit -V

グローバルオプション

これらの flag はすべてのコマンドで使えます：

オプション	説明
`--api-key <key>`, `-k`	API key（または `THUNDERBIT_API_KEY` を設定）
`--base-url <url>`	API base URL（または `THUNDERBIT_API_BASE_URL` を設定）
`--format <format>`, `-f`	出力形式：`json`、`table`、`markdown`（デフォルト `json`）
`--version`, `-V`	CLI バージョンを表示
`--help`, `-h`	コマンドヘルプを表示

コマンド

Distill

単一 URL をクリーンで LLM 向けの Markdown に蒸留します。

# Basic usage
thunderbit distill https://example.com/article

# Stream Markdown to stdout
thunderbit distill https://example.com --format markdown

# Save to file
thunderbit distill https://example.com --format markdown > article.md

Distill オプション

# Use the basic JS renderer (covers most modern sites)
thunderbit distill https://example.com --render-mode basic

# Use the full headless browser (slowest, highest fidelity)
thunderbit distill https://example.com --render-mode full

# Geo-target for region-aware sites
thunderbit distill https://example.com --country-code DE

# Bump per-page timeout
thunderbit distill https://example.com --timeout 60000

# Use sync /distill instead of the default async submit + poll
thunderbit distill https://example.com --sync

利用可能なオプション：

オプション	デフォルト	説明
`--render-mode <mode>`	`none`	`none`、`basic`、`full`
`--timeout <ms>`	`30000`	1 ページあたりのリクエストタイムアウト（ミリ秒）
`--country-code <CC>`	`US`	ISO 2 文字国コード（大文字）
`--sync`	`false`	デフォルトの非同期投入 + ポーリングではなく同期モードを使用

Extract

ページから構造化データを抽出します。schema は fieldName → 自然言語の指示 のフラットマップです —— 各値は AI がページ上でそのフィールドを見つけるための手がかりです。

注: 上流の OpenAPI 仕様の例は JSON Schema ({type:"object",properties:…}) を示しています。本記事執筆時点で本番サーバーは下記のフラットな指示マップを期待しており、現在仕様を揃えています。

# インライン schema —— フィールド → 指示のフラットマップ
thunderbit extract https://example.com/product \
  --schema '{"name":"product name","price":"the listed price as a number","currency":"3-letter currency code"}'

# ファイルから schema を読む
thunderbit extract https://example.com/product --schema ./schema.json

# 抽出した JSON を保存
thunderbit extract https://example.com/product --schema ./schema.json --format json -o data.json

レスポンスの data.data は、スキーマに一致するページ領域 1 件につき 1 要素の配列として常に返ります：

{
  "success": true,
  "data": {
    "url": "https://example.com/product",
    "data": [
      { "name": "iPhone 15 Pro", "price": 999, "currency": "USD" }
    ]
  }
}

インタラクティブモード

# AI proposes fields, you toggle/edit, then extraction runs with the curated schema
thunderbit extract https://example.com/product --interactive

# Steer the suggestion with a prompt
thunderbit extract https://example.com/product -i --prompt "focus on pricing and availability"

# Persist the schema for reuse
thunderbit extract https://example.com/product -i --save-schema ./product-schema.json

Extract オプション

# Bump page-render time for SPAs
thunderbit extract https://example.com --schema ./schema.json --render-mode full

# Sync mode
thunderbit extract https://example.com --schema ./schema.json --sync

# Longer timeout for complex pages
thunderbit extract https://example.com --schema ./schema.json --timeout 120000

利用可能なオプション：

オプション	デフォルト	説明
`--schema <json-or-file>`	—	インライン JSON または schema ファイルパス
`--interactive`, `-i`	`false`	提案 → 整理 → 抽出を一気通貫
`--prompt <text>`	—	AI 提案へのヒント（`-i` と併用）
`--render-mode <mode>`	`none`	`none`、`basic`、`full`
`--timeout <ms>`	`60000`	1 ページあたりのリクエストタイムアウト（ミリ秒）
`--sync`	`false`	同期モード
`--save-schema <file>`	—	最終 schema を保存し再利用

Suggest Fields

スキーマを書く前に、AI に抽出可能なフィールドを提案させます。

# Basic
thunderbit suggest-fields https://example.com/product

# Steer with a prompt
thunderbit suggest-fields https://example.com/listings --prompt "extract job postings only"

# Region-aware
thunderbit suggest-fields https://example.com --country-code DE

インタラクティブエディタでは番号でフィールドをトグル（1 3 5）、add、rm 2、edit 4 した後 done で確定します。suggest-fields は [{name, type, instruction}, …] を返します。これを extract に渡すときは、まずフラットマップに変換してください：

thunderbit suggest-fields "$URL" --format json \
  | jq 'map({(.name): .instruction}) | add' > schema.json

thunderbit extract "$URL" --schema ./schema.json

利用可能なオプション：

オプション	デフォルト	説明
`--prompt <text>`	—	ステアリングヒント
`--country-code <CC>`	`US`	ISO 2 文字国コード

Batch Distill

最大 100 URL を 1 つのバッチジョブで投入します。デフォルトは投入 + COMPLETED / FAILED / CANCELLED までポーリング。

# URLs as positional args
thunderbit batch distill https://a.com https://b.com https://c.com

# Or read URLs from a file (one per line)
thunderbit batch distill --file urls.txt

# Submit only — print the job ID and exit (use webhook or poll later)
thunderbit batch distill --file urls.txt --no-poll

Batch Distill オプション

# Bump per-page timeout
thunderbit batch distill --file urls.txt --timeout 60000

# Pipe results into another tool
thunderbit batch distill --file urls.txt --format json \
  | jq -r '.data.results[] | select(.success == true) | .markdown' \
  > distilled.md

利用可能なオプション：

オプション	デフォルト	説明
`--file <path>`	—	ファイルから URL を読む（1 行 1 件）
`--timeout <ms>`	`30000`	1 ページあたりのリクエストタイムアウト（ミリ秒）
`--no-poll`	`false`	投入のみ、job ID を表示して終了

実行中の batch distill ジョブをキャンセル

thunderbit batch cancel-distill <jobId>

完了済みのページは結果を保持します。保留中のページは破棄され、その分の課金も止まります。サーバーが受領すると、ステータスは CANCELLED に変わります。

Batch Extract

共通スキーマで最大 100 URL を投入します。

# URLs as positional args + inline schema
thunderbit batch extract https://a.com https://b.com \
  --schema ./schema.json

# Read URLs from a file
thunderbit batch extract --file urls.txt --schema ./schema.json

# Submit only
thunderbit batch extract --file urls.txt --schema ./schema.json --no-poll

利用可能なオプション：

オプション	デフォルト	説明
`--file <path>`	—	ファイルから URL を読む（1 行 1 件）
`--schema <json-or-file>`	—	インライン JSON または schema ファイル（必須）
`--timeout <ms>`	`60000`	1 ページあたりのリクエストタイムアウト（ミリ秒）
`--no-poll`	`false`	投入のみ、job ID を表示して終了

実行中の batch extract ジョブをキャンセル

thunderbit batch cancel-extract <jobId>

セマンティクスは cancel-distill と同じです——完了済みの行は保持され、保留中の行は破棄され、残りの課金は停止します。

出力ハンドリング

CLI はデフォルトで stdout に書き出すため、パイプやリダイレクトしやすくなっています。

# Pipe Markdown into another tool
thunderbit distill https://example.com --format markdown | head -50

# Redirect to a file
thunderbit distill https://example.com --format markdown > output.md

# Save extraction JSON
thunderbit extract https://example.com --schema ./schema.json --format json > data.json

フォーマットの挙動

--format json（デフォルト）：完全な API レスポンスを compact JSON として出力。success、data、creditsUsed などを含み、jq にパイプ可能。
--format markdown：distill は生の Markdown 本文を出力、その他のコマンドは完全な JSON を返します。
--format table：表形式の結果（extract、suggest-fields）を ASCII テーブルで出力。

# Markdown body straight to disk
thunderbit distill https://example.com --format markdown

# Full structured response
thunderbit distill https://example.com --format json

サンプル

Quick Distill

# Distill an article
thunderbit distill https://docs.thunderbit.com/introduction --format markdown

# Save HTML-converted Markdown to disk
thunderbit distill https://example.com --format markdown -o page.md

大量 RAG 取り込み

# Distill a docs site listed in urls.txt and write each page to disk
thunderbit batch distill --file urls.txt --format json \
  | jq -r '.data.results[] | select(.success == true) | "\(.url)\t\(.markdown)"' \
  > corpus.tsv

発見してから抽出

# Step 1: AI proposes fields, you curate, schema saved to disk
thunderbit extract https://example.com/product -i --save-schema ./schema.json

# Step 2: re-use across the catalog
thunderbit batch extract --file urls.txt --schema ./schema.json --format json > products.json

CI ゲート — 抽出結果が空なら失敗

thunderbit extract "$URL" --schema ./schema.json --format json \
  | jq -e '.data | length > 0'

他のツールと組み合わせる

# Extract URLs from a search-result JSON
thunderbit distill https://example.com --format json \
  | jq -r '.data.metadata.canonicalUrl'

# Pipe distilled content into a model for summarisation
thunderbit distill "$URL" --format markdown \
  | claude -p "summarise the article in 5 bullets"

# Count successful pages in a batch
thunderbit batch distill --file urls.txt --format json \
  | jq '[.data.results[] | select(.success == true)] | length'

Exit Code

Code	意味
`0`	成功。結果は `--format` で選んだ形式で `stdout` に出ます。
`1`	あらゆる失敗 —— API key 未設定、認証エラー、HTTP 4xx/5xx、ネットワークエラー、schema ファイルの欠落、必須引数の欠落。

エラーテキストはすべて stderr に書かれます。失敗時、stdout は空のままです（--format json でも同様）。つまり jq パイプラインは中途半端なエンベロープを受け取りません —— パースする前に exit コード（または set -e）を確認してください。

ポーリングの進捗（async submit + poll の Processing... (3) など）も stderr に書かれます。黙らせるには 2>/dev/null をパイプしてください。同期モードの単一ページ呼び出し（--sync）は進捗を出力しません。

トラブルシューティング

Error: API key is required。 THUNDERBIT_API_KEY を export するか --api-key を渡してください。

社内プロキシ越しのネットワークエラー。 HTTPS_PROXY と HTTP_PROXY を設定してください — Node と Python のクライアントはどちらも認識します。

バッチポーリングが遅い。 1 ページあたりの予算は --timeout で引き上げてください。ポーリング間隔自体は数秒で固定されており、現状 CLI からは設定できません。

オープンソース

Thunderbit CLI は MIT ライセンスで GitHub にてオープンソース公開：thunderbit-open/thunderbit-mcp-server（同じリポジトリには MCP サーバーと Claude Code プラグインも含まれます）。npm 配布：@thunderbit/thunderbit-cli。

インストール

認証

環境変数で設定

コマンドごとに渡す

セルフホスト / ローカル開発

バージョン確認

グローバルオプション

コマンド

Distill

Distill オプション

Extract

インタラクティブモード

Extract オプション

Suggest Fields

Batch Distill

Batch Distill オプション

実行中の batch distill ジョブをキャンセル

Batch Extract

実行中の batch extract ジョブをキャンセル

出力ハンドリング

フォーマットの挙動

サンプル

Quick Distill

大量 RAG 取り込み

発見してから抽出

CI ゲート — 抽出結果が空なら失敗

他のツールと組み合わせる

Exit Code

トラブルシューティング

オープンソース

関連

目次