Segment videos
Transform raw video into structured, timestamped data. Define the types of segments you want to detect and the fields you want to extract, such as editorial narratives, sports plays, speaker changes, or brand appearances, and the platform automatically identifies segment boundaries and returns custom metadata for each segment in JSON format.
Key features:
- Define your own segments: Describe the types of segments you want to extract using natural language descriptions and custom fields.
- Multimodal prompting: Combine text descriptions with reference images to show the model what to detect. Attach product photos, brand logos, or person images to your segment definitions and reference them by name in your description.
- Extract structured metadata: Receive typed fields for each detected segment.
- Get precise timestamps: Each segment includes a start and end time.
- Analyze multiple segment types: Submit up to 10 segment definitions in a single request to extract different types of metadata.
- Control segment duration: Set minimum and maximum segment durations to match your content structure.
Use cases:
- Segment news broadcasts: Identify editorial narratives, extract topics, and tag named individuals across hours of footage.
- Detect sports highlights: Find scoring plays and key moments with structured metadata for each event.
- Build editorial workflows: Extract titles, summaries, and subject tags for each segment to populate content management systems.
- Create scene-level metadata: Detect scene changes and extract sentiment, visible objects, and activity descriptions.
For details on how your usage is measured and billed, see the Pricing page.
Key concepts
- Segment definition: A description of a type of segment you want to extract. Each definition includes a unique identifier, a natural language description that specifies what to look for, and optional custom fields. You can submit up to 10 definitions per request.
- Segment field: A custom metadata field to extract for each segment. Fields have a name, type (
string,boolean,number,integer, orarray), and a description that specifies what to extract. You can define up to 20 fields per segment definition. - Media sources: A reference image that provides visual context for segment detection. Attach images to a segment definition using the
media_sourcesarray, then reference them by name in your description using angle brackets (Example:<@product_logo>). You can attach up to 4 media sources per segment definition.
Workflow
This guide shows how to define segment definitions, create an asynchronous analysis task with Pegasus 1.5, and parse the timestamped metadata from the results.
Prerequisites
- You already know how to analyze videos asynchronously. For instructions, see the Analyze videos guide.
- You have uploaded a video as an asset. The examples in this guide use an asset ID.
Examples
The examples in this section show different segment definitions for common use cases.
Scene analysis
Editorial metadata
Detect scene changes and extract descriptive metadata for each scene, including camera angles and activities.
To provide visual context for segment detection, add a media_sources array with up to 4 images to your segment definition and reference them by name in your description using angle brackets (Example: <@product_logo>).
Response format
The result.data field is a JSON-encoded string. Parse it with json.loads() in Python or JSON.parse() in Node.js before accessing the data. Every response follows this general structure:
Note the following about the general structure:
- Top-level keys match the values of the
idfield from your segment definitions. - The
start_timeandend_timefields are in seconds (Example:45.0). - Custom fields are nested under the
metadataobject for each segment. - Multiple definitions each produce their own top-level key in the response.
- Segments within a definition do not overlap in time.
Example responses
These responses correspond to the examples above, trimmed to the first two segments for brevity:
Scene analysis
Editorial metadata
Best practices
- Write specific field descriptions: The
descriptionfield controls what the platform extracts. Be specific about the expected content and format. - Use
enumfor categorical fields: When a field has a known set of values, list them withenumso that responses use only those values. - Start with fewer fields: Begin with 2-3 fields per segment definition and add more as needed.
- Set segment duration constraints: Use
min_segment_durationto avoid very short segments andmax_segment_durationto limit how long a single segment can be. - Use meaningful segment definition identifiers: The
idvalue becomes the top-level key in your response JSON, so use descriptive identifiers. - Handle truncated responses: If the
finish_reasonfield in the response islengthinstead ofstop, the JSON may be incomplete. Increasemax_tokensor reduce the number of fields and definitions. - Keep segment definitions focused: Each definition should target one type of segment. Create separate definitions for different analysis goals.
Troubleshooting
Prompt not allowed with video segmentation
Problem: You receive a 400 error when including the prompt parameter.
Cause: The prompt parameter is not supported when the analysis_mode parameter is set to time_based_metadata. Segment definitions and field descriptions serve as instructions instead.
Solution: Remove the prompt parameter from your request. Use the description field in your segment definitions and on individual fields to specify what to extract.
Missing required field description
Problem: You receive a validation error about a missing field description.
Cause: Every field in a segment definition requires a description field that specifies what to extract.
Solution: Add a description field to every entry in your fields array:
Truncated or invalid JSON in response
Problem: Calls to the json.loads() function in Python or the JSON.parse() function in Node.js fail when parsing the result.data field.
Cause: The response exceeded the limit set by the max_tokens parameter or the default value of 32,768. When the finish_reason field is length, the returned JSON may be incomplete.
Solution: Increase the value of the max_tokens parameter, reduce the number of segment definitions or fields, or split your analysis across multiple requests.