FAQ
Common questions about the Thunderbit Open API: pricing, rate limits, JSON schema vs prompt, robots.txt, regional access, and how to handle JS-heavy sites.
Q: Can I scrape sites that require login?
A: Today, no. You can pass cookies or auth tokens via the headers parameter for sites that accept them, but interactive login flows aren't supported via the API. Reach out for enterprise options.
Q: What's the maximum page size?
A: 10 MB of HTML before processing. Pages exceeding this return SCRAPE_CONTENT_TOO_LARGE.
Q: How fresh is the data?
A: Default behavior fetches live, every call. Set forceRefresh: true to bypass any internal caching layer explicitly.
Q: Can I run multiple batch jobs in parallel? A: Yes — the per-batch limit is 100 URLs, but there's no cap on the number of concurrent batch jobs (subject to your plan's concurrency).
Q: What happens if a single URL in a batch fails?
A: The batch keeps going. The failing URL gets status: "FAILED" with an error code; the rest succeed. The job moves to COMPLETED once all URLs reach a terminal state.
Q: Does the API respect robots.txt?
A: We honor robots.txt for distillation by default. Enterprise plans can request override on a per-domain basis with proof of authorization.
Q: Can I use both schema and prompt on /extract?
A: No — they're mutually exclusive (SCHEMA_AND_PROMPT_EXCLUSIVE). Today, schema is the only supported mode; prompt-only extraction is on the roadmap.
Q: How do I get notified when a long batch finishes?
A: Use the webhook field on submission. Polling works too, but webhooks are cheaper for jobs lasting >1 minute. See Webhooks.
Q: How do I report a bug or request a feature? A: Email support@thunderbit.com or use the in-app contact form on the dashboard.