Guides

FAQ

Common questions about the Thunderbit Open API: pricing, rate limits, JSON schema vs prompt, robots.txt, regional access, and how to handle JS-heavy sites.

Q: Can I scrape sites that require login? A: Today, no. You can pass cookies or auth tokens via the headers parameter for sites that accept them, but interactive login flows aren't supported via the API. Reach out for enterprise options.

Q: What's the maximum page size? A: 10 MB of HTML before processing. Pages exceeding this return SCRAPE_CONTENT_TOO_LARGE.

Q: How fresh is the data? A: Default behavior fetches live, every call. Set forceRefresh: true to bypass any internal caching layer explicitly.

Q: Can I run multiple batch jobs in parallel? A: Yes — the per-batch limit is 100 URLs, but there's no cap on the number of concurrent batch jobs (subject to your plan's concurrency).

Q: What happens if a single URL in a batch fails? A: The batch keeps going. The failing URL gets status: "FAILED" with an error code; the rest succeed. The job moves to COMPLETED once all URLs reach a terminal state.

Q: Does the API respect robots.txt? A: We honor robots.txt for distillation by default. Enterprise plans can request override on a per-domain basis with proof of authorization.

Q: Can I use both schema and prompt on /extract? A: No — they're mutually exclusive (SCHEMA_AND_PROMPT_EXCLUSIVE). Today, schema is the only supported mode; prompt-only extraction is on the roadmap.

Q: How do I get notified when a long batch finishes? A: Use the webhook field on submission. Polling works too, but webhooks are cheaper for jobs lasting >1 minute. See Webhooks.

Q: How do I report a bug or request a feature? A: Email support@thunderbit.com or use the in-app contact form on the dashboard.