Guides

Schema Design

Designing JSON Schemas the AI can extract reliably

The schema you pass to /extract is also a prompt. Every field name, description, and type hint is read by the model. A well-shaped schema dramatically improves accuracy.

Field naming

Use names that read like English. The model is much better at productName than pn or name1.

{ "type": "object", "properties": {
  "productName": { "type": "string" },
  "currentPrice": { "type": "number" }
} }

Field descriptions

Add description to anything ambiguous. "price" could be MSRP, current, or per-unit — be explicit:

{ "currentPrice": {
  "type": "number",
  "description": "Final price after discount, in USD"
} }

Required vs optional

Mark only the fields you truly need. Required fields cause the entire extraction to fail if the model can't find them — use sparingly.

Nesting

Prefer one level of nesting where useful (address.city). Deeper nesting (3+ levels) tends to hurt extraction quality.

Common pitfalls

  • Using ambiguous types (string for numbers like "$19.99") — prefer number and let the model parse
  • Vague enums without descriptions
  • Required fields that aren't actually present on every page

This page is being expanded with a schema cookbook — check back soon.