MCP Server
Use Thunderbit's API through the Model Context Protocol — search, scrape, and extract from any MCP-compatible AI host.
A Model Context Protocol (MCP) server that integrates Thunderbit for distilling pages into Markdown, pulling structured fields out of any page with natural-language instructions, and running batch jobs of up to 100 URLs at a time. Open-source on GitHub and distributed on npm as @thunderbit/mcp-server.
Features
- Distill any URL into clean, LLM-ready Markdown
- Extract structured data with a flat field-name → instruction map
- Auto-suggest extractable fields with
suggest_fields - Run batch jobs on up to 100 URLs with poll-based status
- Per-call API key override for multi-tenant agents
- Self-hosted support via
THUNDERBIT_API_BASE_URL
Installation
Get your API key from Thunderbit Dashboard.
Running with npx
env THUNDERBIT_API_KEY=tb_YOUR_API_KEY npx -y @thunderbit/mcp-serverManual installation
npm install -g @thunderbit/mcp-serverRunning on Cursor
Edit ~/.cursor/mcp.json:
{
"mcpServers": {
"thunderbit": {
"command": "npx",
"args": ["-y", "@thunderbit/mcp-server"],
"env": {
"THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
}
}
}
}After saving, refresh the MCP server list in Cursor Settings → Features → MCP Servers. The Composer Agent will use Thunderbit when you ask about web data.
Running on Windsurf
Add this to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"thunderbit": {
"command": "npx",
"args": ["-y", "@thunderbit/mcp-server"],
"env": {
"THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
}
}
}
}Running on VS Code
For workspace-shared config, create .vscode/mcp.json:
{
"inputs": [
{
"type": "promptString",
"id": "apiKey",
"description": "Thunderbit API Key",
"password": true
}
],
"servers": {
"thunderbit": {
"command": "npx",
"args": ["-y", "@thunderbit/mcp-server"],
"env": {
"THUNDERBIT_API_KEY": "${input:apiKey}"
}
}
}
}For user-global config, paste the same servers block into User Settings (JSON) under an mcp key.
Running on Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"thunderbit": {
"command": "npx",
"args": ["-y", "@thunderbit/mcp-server"],
"env": {
"THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
}
}
}
}If you see spawn npx ENOENT, Node.js isn't on the system PATH. Install Node.js LTS from the official site and restart Claude Desktop fully. On Windows, you can also run where npx and use the absolute path (e.g. C:\Program Files\nodejs\npx.cmd) as the command value.
Running on Claude Code
claude mcp add thunderbit -e THUNDERBIT_API_KEY=tb_YOUR_API_KEY -- npx -y @thunderbit/mcp-serverRunning on Cline
In Cline's MCP Servers panel, click Add Server and use:
{
"command": "npx",
"args": ["-y", "@thunderbit/mcp-server"],
"env": {
"THUNDERBIT_API_KEY": "tb_YOUR_API_KEY"
}
}Configuration
Environment Variables
Required
THUNDERBIT_API_KEY— your API key (tb_followed by 32 hex chars). Required unless every tool call passes its ownapiKeyargument.
Optional
THUNDERBIT_API_BASE_URL— point at a self-hosted gateway. Default:https://openapi.thunderbit.comTHUNDERBIT_TIMEOUT_MS— per-call HTTP timeout. Default:120000(2 minutes). Bump this for slow batch polling.
Configuration examples
For cloud API with default settings:
export THUNDERBIT_API_KEY=tb_your_keyFor self-hosted instances:
export THUNDERBIT_API_KEY=tb_your_key
export THUNDERBIT_API_BASE_URL=https://openapi.your-domain.com
export THUNDERBIT_TIMEOUT_MS=300000Custom configuration with Claude Desktop
{
"mcpServers": {
"thunderbit": {
"command": "npx",
"args": ["-y", "@thunderbit/mcp-server"],
"env": {
"THUNDERBIT_API_KEY": "tb_YOUR_API_KEY",
"THUNDERBIT_API_BASE_URL": "https://openapi.thunderbit.com",
"THUNDERBIT_TIMEOUT_MS": "180000"
}
}
}
}Per-call API key override
Every tool accepts an optional apiKey argument that overrides THUNDERBIT_API_KEY. Useful when one MCP server fronts multiple end-users:
{
"url": "https://example.com",
"apiKey": "tb_customer_specific_key"
}Available Tools
1. Distill Tool (thunderbit_distill)
Convert any web page into clean, LLM-ready Markdown. Costs 1 credit per call.
{
"name": "thunderbit_distill",
"arguments": {
"url": "https://example.com/article",
"renderMode": "basic",
"waitFor": 1000,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"countryCode": "US",
"timeout": 30000
}
}Distill Tool Options
url: Web page URL to convert (required)renderMode:none|basic|full— controls JS rendering depthwaitFor: Wait time in ms after page load (0–10000) — bump for SPAsincludeTags: HTML tags to include (e.g.["article", "main"])excludeTags: HTML tags to exclude (e.g.["nav", "footer"])countryCode: ISO 2-letter country code, uppercase (default:US)timeout: Request timeout in ms (5000–60000)apiKey: Override env-var key for this call
Best for: Article reading, RAG ingestion, bulk page summarisation, content analysis. Returns: Markdown string.
2. Extract Tool (thunderbit_extract)
Extract structured data from a web page. The schema is a flat map of fieldName → natural-language instruction. Costs 20 credits per call.
Note: the upstream OpenAPI spec describes
schemaas JSON Schema. The live server expects the flat instruction map shown below; we're aligning the spec.
{
"name": "thunderbit_extract",
"arguments": {
"url": "https://example.com/product",
"schema": {
"name": "product name",
"price": "the listed price as a number",
"currency": "3-letter currency code",
"inStock": "true if the product is available, false otherwise"
},
"renderMode": "basic"
}
}The result data.data is always an array, even when you only expect a single record:
{
"data": {
"url": "https://example.com/product",
"data": [
{ "name": "iPhone 15 Pro", "price": 999, "currency": "USD", "inStock": true }
]
}
}Extract Tool Options
url: Web page URL (required)schema: FlatRecord<string, string>— field name → instruction (required)renderMode:none|basic|fullwaitFor: Wait time in ms after page load (0–10000)timeout: Request timeout in ms (5000–120000)apiKey: Per-call key override
Best for: Lead gen, price monitoring, competitive analysis, dataset building.
Returns: data.data array of objects keyed by your schema's field names.
3. Suggest Fields Tool (thunderbit_suggest_fields)
Analyse a page and propose extractable fields. Use this first when you don't know what data a page contains. Costs 1 credit per call.
{
"name": "thunderbit_suggest_fields",
"arguments": {
"url": "https://example.com/product",
"prompt": "Focus on pricing, availability, and shipping",
"countryCode": "US"
}
}Suggest Fields Tool Options
url: Web page URL to analyse (required)prompt: Optional steering hint for the AIcountryCode: ISO 2-letter country code (default:US)apiKey: Per-call key override
Best for: Discovering schema before running extract; bootstrapping new scrape targets.
Returns: Array of { name, type, instruction } objects, ready to feed into thunderbit_extract.
4. Batch Distill Create (thunderbit_batch_distill_create)
Submit up to 100 URLs for distillation. Returns a job ID — poll thunderbit_batch_distill_status until complete. Costs 1 credit per URL.
{
"name": "thunderbit_batch_distill_create",
"arguments": {
"urls": [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3"
],
"timeout": 30000
}
}Batch Distill Create Options
urls: Array of URLs (1–100, required)timeout: Per-page request timeout in ms (5000–60000)apiKey: Per-call key override
Best for: Bulk RAG ingestion, full-site distillation when paired with the Map endpoint.
Returns: { id, status, total } — pass id to the status tool.
5. Batch Distill Status (thunderbit_batch_distill_status)
Poll a batch distill job and retrieve paginated results. Free.
{
"name": "thunderbit_batch_distill_status",
"arguments": {
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"page": 0,
"pageSize": 20
}
}Batch Distill Status Options
jobId: Job ID fromthunderbit_batch_distill_create(required)page: 0-based page index (default0)pageSize: Results per page, 1–100 (default20)apiKey: Per-call key override
Best for: Polling; final-result retrieval. Increment page until results is empty.
Returns: { id, status, total, completed, failed, creditsUsed, createdAt, completedAt, results: [{ index, url, success, markdown, error }, …] }. The job-level status is the enum (PENDING / PROCESSING / COMPLETED / FAILED / CANCELLED); each per-URL result item uses a boolean success field, not status.
6. Batch Extract Create (thunderbit_batch_extract_create)
Submit up to 100 URLs for extraction with a single shared schema. Costs 20 credits per URL.
{
"name": "thunderbit_batch_extract_create",
"arguments": {
"urls": [
"https://example.com/product-1",
"https://example.com/product-2"
],
"schema": {
"name": "product name",
"price": "the listed price as a number"
},
"timeout": 60000
}
}Batch Extract Create Options
urls: Array of URLs (1–100, required)schema: FlatRecord<string, string>(field → instruction) applied to every URL (required)timeout: Per-page request timeout in ms (5000–120000)apiKey: Per-call key override
Best for: Catalog scraping, large-scale dataset building.
Returns: { id, status, total }.
7. Batch Extract Status (thunderbit_batch_extract_status)
Poll a batch extract job. Free.
{
"name": "thunderbit_batch_extract_status",
"arguments": {
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"page": 0,
"pageSize": 20
}
}Same options shape as thunderbit_batch_distill_status. Returns paginated extracted data per URL.
Recommended workflow
thunderbit_suggest_fields— see what data the page exposesthunderbit_extract(orthunderbit_distill) — pull a single URLthunderbit_batch_*_create— fan out up to 100 URLsthunderbit_batch_*_status— poll until terminal
Error Handling
Every tool returns errors as MCP tool errors (isError: true) with a structured hint, so the model can decide whether to retry or surface the failure to the user.
// Pseudo-code: how the host receives an error
{
isError: true,
content: [{
type: "text",
text: "Thunderbit API error (402): INSUFFICIENT_CREDITS — Top up at https://thunderbit.com/billing"
}]
}| HTTP | Code | Hint emitted by the server |
|---|---|---|
| 401 | API_KEY_INVALID_FORMAT / API_KEY_NOT_FOUND | "Check your API key at https://app.thunderbit.com/console" |
| 402 | INSUFFICIENT_CREDITS | "Top up at https://thunderbit.com/billing" |
| 429 | RATE_LIMIT_EXCEEDED | "Rate limit exceeded, retry later" |
| 5xx | INTERNAL_ERROR / DISTILL_FAILED | (no hint, server message passed through) |
Full code list: API Reference → Errors.
Troubleshooting
Tools don't show up in the host. Restart the host fully after editing the config. The server logs a startup line on stderr — check the host's MCP log file.
Cannot find module '@thunderbit/mcp-server'. Either pre-install (npm install -g @thunderbit/mcp-server) or whitelist npx network access from the host's sandbox.
401 on first run. The env block is per-server — verify the key isn't trapped in the wrong server entry. Try the manual install path with the key exported in the shell to isolate.
Batch jobs stall. The MCP tool only submits and polls; it doesn't keep a long-running connection. Bump THUNDERBIT_TIMEOUT_MS to 180000+ for large batches, or pair batch_*_create with a webhook in your application to fire-and-forget.
Build your own
Want a custom tool surface (different prompts, scoping, filtering, extra tools)? See the MCP integration guide for a from-scratch server build with @modelcontextprotocol/sdk.
Open Source
@thunderbit/mcp-server is MIT-licensed and open source on GitHub: GitHub repository.
Related
- CLI — same operations, from the shell
- Quickstart — raw HTTP if you'd rather not use MCP
- API Reference — endpoint details