Get Started

MCP Server

Use Thunderbit's API through the Model Context Protocol — search, scrape, and extract from any MCP-compatible AI host.

A Model Context Protocol (MCP) server that integrates Thunderbit for distilling pages into Markdown, pulling structured fields out of any page with natural-language instructions, and running batch jobs of up to 100 URLs at a time. Open-source on GitHub and distributed on npm as @thunderbit/mcp-server.

Features

  • Distill any URL into clean, LLM-ready Markdown
  • Extract structured data with a flat field-name → instruction map
  • Auto-suggest extractable fields with suggest_fields
  • Run batch jobs on up to 100 URLs with poll-based status
  • Per-call API key override for multi-tenant agents
  • Self-hosted support via THUNDERBIT_API_BASE_URL

Installation

Get your API key from Thunderbit Dashboard.

Running with npx

env THUNDERBIT_API_KEY=tb_YOUR_API_KEY npx -y @thunderbit/mcp-server

Manual installation

npm install -g @thunderbit/mcp-server

Running on Cursor

Edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "thunderbit": {
      "command": "npx",
      "args": ["-y", "@thunderbit/mcp-server"],
      "env": {
        "THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
      }
    }
  }
}

After saving, refresh the MCP server list in Cursor Settings → Features → MCP Servers. The Composer Agent will use Thunderbit when you ask about web data.

Running on Windsurf

Add this to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "thunderbit": {
      "command": "npx",
      "args": ["-y", "@thunderbit/mcp-server"],
      "env": {
        "THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
      }
    }
  }
}

Running on VS Code

For workspace-shared config, create .vscode/mcp.json:

{
  "inputs": [
    {
      "type": "promptString",
      "id": "apiKey",
      "description": "Thunderbit API Key",
      "password": true
    }
  ],
  "servers": {
    "thunderbit": {
      "command": "npx",
      "args": ["-y", "@thunderbit/mcp-server"],
      "env": {
        "THUNDERBIT_API_KEY": "${input:apiKey}"
      }
    }
  }
}

For user-global config, paste the same servers block into User Settings (JSON) under an mcp key.

Running on Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "thunderbit": {
      "command": "npx",
      "args": ["-y", "@thunderbit/mcp-server"],
      "env": {
        "THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
      }
    }
  }
}

If you see spawn npx ENOENT, Node.js isn't on the system PATH. Install Node.js LTS from the official site and restart Claude Desktop fully. On Windows, you can also run where npx and use the absolute path (e.g. C:\Program Files\nodejs\npx.cmd) as the command value.

Running on Claude Code

claude mcp add thunderbit -e THUNDERBIT_API_KEY=tb_YOUR_API_KEY -- npx -y @thunderbit/mcp-server

Running on Cline

In Cline's MCP Servers panel, click Add Server and use:

{
  "command": "npx",
  "args": ["-y", "@thunderbit/mcp-server"],
  "env": {
    "THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
  }
}

Configuration

Environment Variables

Required

  • THUNDERBIT_API_KEY — your API key (tb_ followed by 32 hex chars). Required unless every tool call passes its own apiKey argument.

Optional

  • THUNDERBIT_API_BASE_URL — point at a self-hosted gateway. Default: https://openapi.thunderbit.com
  • THUNDERBIT_TIMEOUT_MS — per-call HTTP timeout. Default: 120000 (2 minutes). Bump this for slow batch polling.

Configuration examples

For cloud API with default settings:

export THUNDERBIT_API_KEY=tb_your_key

For self-hosted instances:

export THUNDERBIT_API_KEY=tb_your_key
export THUNDERBIT_API_BASE_URL=https://openapi.your-domain.com
export THUNDERBIT_TIMEOUT_MS=300000

Custom configuration with Claude Desktop

{
  "mcpServers": {
    "thunderbit": {
      "command": "npx",
      "args": ["-y", "@thunderbit/mcp-server"],
      "env": {
        "THUNDERBIT_API_KEY": "tb_YOUR_API_KEY",
        "THUNDERBIT_API_BASE_URL": "https://openapi.thunderbit.com",
        "THUNDERBIT_TIMEOUT_MS": "180000"
      }
    }
  }
}

Per-call API key override

Every tool accepts an optional apiKey argument that overrides THUNDERBIT_API_KEY. Useful when one MCP server fronts multiple end-users:

{
  "url": "https://example.com",
  "apiKey": "tb_customer_specific_key"
}

Available Tools

1. Distill Tool (thunderbit_distill)

Convert any web page into clean, LLM-ready Markdown. Costs 1 credit per call.

{
  "name": "thunderbit_distill",
  "arguments": {
    "url": "https://example.com/article",
    "renderMode": "basic",
    "waitFor": 1000,
    "includeTags": ["article", "main"],
    "excludeTags": ["nav", "footer"],
    "countryCode": "US",
    "timeout": 30000
  }
}

Distill Tool Options

  • url: Web page URL to convert (required)
  • renderMode: none | basic | full — controls JS rendering depth
  • waitFor: Wait time in ms after page load (0–10000) — bump for SPAs
  • includeTags: HTML tags to include (e.g. ["article", "main"])
  • excludeTags: HTML tags to exclude (e.g. ["nav", "footer"])
  • countryCode: ISO 2-letter country code, uppercase (default: US)
  • timeout: Request timeout in ms (5000–60000)
  • apiKey: Override env-var key for this call

Best for: Article reading, RAG ingestion, bulk page summarisation, content analysis. Returns: Markdown string.

2. Extract Tool (thunderbit_extract)

Extract structured data from a web page. The schema is a flat map of fieldName → natural-language instruction. Costs 20 credits per call.

Note: the upstream OpenAPI spec describes schema as JSON Schema. The live server expects the flat instruction map shown below; we're aligning the spec.

{
  "name": "thunderbit_extract",
  "arguments": {
    "url": "https://example.com/product",
    "schema": {
      "name": "product name",
      "price": "the listed price as a number",
      "currency": "3-letter currency code",
      "inStock": "true if the product is available, false otherwise"
    },
    "renderMode": "basic"
  }
}

The result data.data is always an array, even when you only expect a single record:

{
  "data": {
    "url": "https://example.com/product",
    "data": [
      { "name": "iPhone 15 Pro", "price": 999, "currency": "USD", "inStock": true }
    ]
  }
}

Extract Tool Options

  • url: Web page URL (required)
  • schema: Flat Record<string, string> — field name → instruction (required)
  • renderMode: none | basic | full
  • waitFor: Wait time in ms after page load (0–10000)
  • timeout: Request timeout in ms (5000–120000)
  • apiKey: Per-call key override

Best for: Lead gen, price monitoring, competitive analysis, dataset building. Returns: data.data array of objects keyed by your schema's field names.

3. Suggest Fields Tool (thunderbit_suggest_fields)

Analyse a page and propose extractable fields. Use this first when you don't know what data a page contains. Costs 1 credit per call.

{
  "name": "thunderbit_suggest_fields",
  "arguments": {
    "url": "https://example.com/product",
    "prompt": "Focus on pricing, availability, and shipping",
    "countryCode": "US"
  }
}

Suggest Fields Tool Options

  • url: Web page URL to analyse (required)
  • prompt: Optional steering hint for the AI
  • countryCode: ISO 2-letter country code (default: US)
  • apiKey: Per-call key override

Best for: Discovering schema before running extract; bootstrapping new scrape targets. Returns: Array of { name, type, instruction } objects, ready to feed into thunderbit_extract.

4. Batch Distill Create (thunderbit_batch_distill_create)

Submit up to 100 URLs for distillation. Returns a job ID — poll thunderbit_batch_distill_status until complete. Costs 1 credit per URL.

{
  "name": "thunderbit_batch_distill_create",
  "arguments": {
    "urls": [
      "https://example.com/page-1",
      "https://example.com/page-2",
      "https://example.com/page-3"
    ],
    "timeout": 30000
  }
}

Batch Distill Create Options

  • urls: Array of URLs (1–100, required)
  • timeout: Per-page request timeout in ms (5000–60000)
  • apiKey: Per-call key override

Best for: Bulk RAG ingestion, full-site distillation when paired with the Map endpoint. Returns: { id, status, total } — pass id to the status tool.

5. Batch Distill Status (thunderbit_batch_distill_status)

Poll a batch distill job and retrieve paginated results. Free.

{
  "name": "thunderbit_batch_distill_status",
  "arguments": {
    "jobId": "550e8400-e29b-41d4-a716-446655440000",
    "page": 0,
    "pageSize": 20
  }
}

Batch Distill Status Options

  • jobId: Job ID from thunderbit_batch_distill_create (required)
  • page: 0-based page index (default 0)
  • pageSize: Results per page, 1–100 (default 20)
  • apiKey: Per-call key override

Best for: Polling; final-result retrieval. Increment page until results is empty. Returns: { id, status, total, completed, failed, creditsUsed, createdAt, completedAt, results: [{ index, url, success, markdown, error }, …] }. The job-level status is the enum (PENDING / PROCESSING / COMPLETED / FAILED / CANCELLED); each per-URL result item uses a boolean success field, not status.

6. Batch Extract Create (thunderbit_batch_extract_create)

Submit up to 100 URLs for extraction with a single shared schema. Costs 20 credits per URL.

{
  "name": "thunderbit_batch_extract_create",
  "arguments": {
    "urls": [
      "https://example.com/product-1",
      "https://example.com/product-2"
    ],
    "schema": {
      "name": "product name",
      "price": "the listed price as a number"
    },
    "timeout": 60000
  }
}

Batch Extract Create Options

  • urls: Array of URLs (1–100, required)
  • schema: Flat Record<string, string> (field → instruction) applied to every URL (required)
  • timeout: Per-page request timeout in ms (5000–120000)
  • apiKey: Per-call key override

Best for: Catalog scraping, large-scale dataset building. Returns: { id, status, total }.

7. Batch Extract Status (thunderbit_batch_extract_status)

Poll a batch extract job. Free.

{
  "name": "thunderbit_batch_extract_status",
  "arguments": {
    "jobId": "550e8400-e29b-41d4-a716-446655440000",
    "page": 0,
    "pageSize": 20
  }
}

Same options shape as thunderbit_batch_distill_status. Returns paginated extracted data per URL.

  1. thunderbit_suggest_fields — see what data the page exposes
  2. thunderbit_extract (or thunderbit_distill) — pull a single URL
  3. thunderbit_batch_*_create — fan out up to 100 URLs
  4. thunderbit_batch_*_status — poll until terminal

Error Handling

Every tool returns errors as MCP tool errors (isError: true) with a structured hint, so the model can decide whether to retry or surface the failure to the user.

// Pseudo-code: how the host receives an error
{
  isError: true,
  content: [{
    type: "text",
    text: "Thunderbit API error (402): INSUFFICIENT_CREDITS — Top up at https://thunderbit.com/billing"
  }]
}
HTTPCodeHint emitted by the server
401API_KEY_INVALID_FORMAT / API_KEY_NOT_FOUND"Check your API key at https://app.thunderbit.com/console"
402INSUFFICIENT_CREDITS"Top up at https://thunderbit.com/billing"
429RATE_LIMIT_EXCEEDED"Rate limit exceeded, retry later"
5xxINTERNAL_ERROR / DISTILL_FAILED(no hint, server message passed through)

Full code list: API Reference → Errors.

Troubleshooting

Tools don't show up in the host. Restart the host fully after editing the config. The server logs a startup line on stderr — check the host's MCP log file.

Cannot find module '@thunderbit/mcp-server'. Either pre-install (npm install -g @thunderbit/mcp-server) or whitelist npx network access from the host's sandbox.

401 on first run. The env block is per-server — verify the key isn't trapped in the wrong server entry. Try the manual install path with the key exported in the shell to isolate.

Batch jobs stall. The MCP tool only submits and polls; it doesn't keep a long-running connection. Bump THUNDERBIT_TIMEOUT_MS to 180000+ for large batches, or pair batch_*_create with a webhook in your application to fire-and-forget.

Build your own

Want a custom tool surface (different prompts, scoping, filtering, extra tools)? See the MCP integration guide for a from-scratch server build with @modelcontextprotocol/sdk.

Open Source

@thunderbit/mcp-server is MIT-licensed and open source on GitHub: GitHub repository.