Configure ingestion

Ingestion config is optional. If omitted, Jockey uses the default extraction. Configure it when you know what your use case needs - it helps Jockey emphasize the right signals and produce more reliable extraction.

Pass the ingestion config when you create a knowledge store.

Choosing an approach

ScenarioApproach
Getting started or quick prototypingDefault (no config)
You know the domain but not the exact fieldsNatural language description
Downstream code expects specific typed fieldsJSON Schema

Natural language description

Describe what matters in plain English. Jockey converts your description to a schema internally.

1{
2 "ingestion_config": {
3 "enrichment_config": {
4 "description": "Focus on brand mentions, product appearances, audience reactions, and visual tone"
5 }
6 }
7}

Best for exploratory work when you don’t know the exact fields you need.

JSON Schema

Provide a JSON Schema (draft 2020-12) for precise, structured extraction.

1{
2 "ingestion_config": {
3 "enrichment_config": {
4 "json_schema": {
5 "type": "object",
6 "properties": {
7 "people_count": {"type": "integer"},
8 "location": {"type": "string"},
9 "suspicious_activity": {"type": "boolean"},
10 "description": {"type": "string"}
11 }
12 }
13 }
14 }
15}

Best for production systems where downstream code expects specific fields.

Pitfalls

  • Overly specific schemas can limit extraction. If the schema is too specific, Jockey may miss relevant content. Start broad, then narrow the schema as you learn what you need.
  • General descriptions can lead to broad extraction. If the description is too general, Jockey may return results that are broader than you need. Name the fields, entities, or patterns you want Jockey to emphasize.

Jupyter notebook

Download the notebook to run this guide interactively.

API reference