Guides

Best Practices

Production tips for the Thunderbit Open API: concurrency, retries with backoff, idempotency, error handling, schema design, and cost-efficient batch sizing.

  • Cache aggressively. Distill responses are deterministic for static pages — cache the markdown by URL hash on your side and bypass with forceRefresh: true only when you need fresh data.
  • Use include over legacy booleans. includeHtml: true and extractLinks: true still work; the new include: ["metadata", "links"] array is composable and clearer in code review.
  • Prefer batch over loops. A batch of 50 URLs returns one job ID; 50 individual /distill calls burn rate limit and concurrency.
  • Use webhooks for jobs >10 URLs. Polling every 5 seconds for a 5-minute job wastes ~60 round-trips. See Webhooks.
  • Wait only when you need to. waitFor: 2000 doubles your latency budget — set it only for SPAs that hydrate slowly.
  • Pin a countryCode when scraping geo-aware sites (pricing, search results, e-commerce). Defaults to US.
  • Start with renderMode: "none" and upgrade to basic or full only if the page returns empty — most pages don't need a headless browser. See Render Modes.
  • Be specific in schemas. Field descriptions are read by the AI; "product MSRP in USD before discount" extracts more reliably than "price". See Schema Design.
  • Make webhook handlers idempotent. A webhook can fire more than once for the same job ID under network partition.