Configure ingestion
Ingestion config is optional. If omitted, Jockey uses the default extraction. Configure it when you know what your use case needs - it helps Jockey emphasize the right signals and produce more reliable extraction.
Pass the ingestion config when you create a knowledge store.
Choosing an approach
Natural language description
Describe what matters in plain English. Jockey converts your description to a schema internally.
Best for exploratory work when you don’t know the exact fields you need.
JSON Schema
Provide a JSON Schema (draft 2020-12) for precise, structured extraction.
Best for production systems where downstream code expects specific fields.
Pitfalls
- Overly specific schemas can limit extraction. If the schema is too specific, Jockey may miss relevant content. Start broad, then narrow the schema as you learn what you need.
- General descriptions can lead to broad extraction. If the description is too general, Jockey may return results that are broader than you need. Name the fields, entities, or patterns you want Jockey to emphasize.
Jupyter notebook
Download the notebook to run this guide interactively.