Segment videos

Transform raw video into structured, timestamped data. Define the types of segments you want to detect and the fields you want to extract, such as editorial narratives, sports plays, speaker changes, or brand appearances, and the platform automatically identifies segment boundaries and returns custom metadata for each segment in JSON format.

Key features:

  • Define your own segments: Describe the types of segments you want to extract using natural language descriptions and custom fields.
  • Multimodal prompting: Combine text descriptions with reference images to show the model what to detect. Attach product photos, brand logos, or person images to your segment definitions and reference them by name in your description.
  • Extract structured metadata: Receive typed fields for each detected segment.
  • Get precise timestamps: Each segment includes a start and end time.
  • Analyze multiple segment types: Submit up to 10 segment definitions in a single request to extract different types of metadata.
  • Control segment duration: Set minimum and maximum segment durations to match your content structure.

Use cases:

  • Segment news broadcasts: Identify editorial narratives, extract topics, and tag named individuals across hours of footage.
  • Detect sports highlights: Find scoring plays and key moments with structured metadata for each event.
  • Build editorial workflows: Extract titles, summaries, and subject tags for each segment to populate content management systems.
  • Create scene-level metadata: Detect scene changes and extract sentiment, visible objects, and activity descriptions.

For details on how your usage is measured and billed, see the Pricing page.

Key concepts

  • Segment definition: A description of a type of segment you want to extract. Each definition includes a unique identifier, a natural language description that specifies what to look for, and optional custom fields. You can submit up to 10 definitions per request.
  • Segment field: A custom metadata field to extract for each segment. Fields have a name, type (string, boolean, number, integer, or array), and a description that specifies what to extract. You can define up to 20 fields per segment definition.
  • Media sources: A reference image that provides visual context for segment detection. Attach images to a segment definition using the media_sources array, then reference them by name in your description using angle brackets (Example: <@product_logo>). You can attach up to 4 media sources per segment definition.

Workflow

This guide shows how to define segment definitions, create an asynchronous analysis task with Pegasus 1.5, and parse the timestamped metadata from the results.

Prerequisites

  • You already know how to analyze videos asynchronously. For instructions, see the Analyze videos guide.
  • You have uploaded a video as an asset. The examples in this guide use an asset ID.

Examples

The examples in this section show different segment definitions for common use cases.

Detect scene changes and extract descriptive metadata for each scene, including camera angles and activities.

1import json
2import time
3from twelvelabs import TwelveLabs
4from twelvelabs.types import AsyncResponseFormat, VideoContext_AssetId
5
6client = TwelveLabs(api_key="<YOUR_API_KEY>")
7
8task = client.analyze_async.tasks.create(
9 video=VideoContext_AssetId(asset_id="<YOUR_ASSET_ID>"),
10 model_name="pegasus1.5",
11 analysis_mode="time_based_metadata",
12 # temperature=0.2, # Optional: Controls output randomness (0.0-1.0). Default: 0.2
13 # max_tokens=32768, # Optional: Maximum tokens in the response (2,048–32,768). Default: 32,768
14 # min_segment_duration=5.0, # Optional: Minimum segment length in seconds. Minimum: 2
15 # max_segment_duration=60.0, # Optional: Maximum segment length in seconds
16 response_format=AsyncResponseFormat(
17 type="segment_definitions",
18 segment_definitions=[
19 {
20 "id": "scenes",
21 "description": "Segment the video into distinct scenes based on changes in setting, topic, or visual composition",
22 "fields": [
23 {
24 "name": "description",
25 "type": "string",
26 "description": "A brief description of what happens in this scene"
27 },
28 {
29 "name": "camera_angle",
30 "type": "string",
31 "description": "The primary camera angle used in this scene",
32 "enum": ["wide", "medium", "close_up", "overhead"]
33 },
34 {
35 "name": "activity",
36 "type": "string",
37 "description": "The main activity taking place in this scene"
38 }
39 ]
40 }
41 ]
42 )
43)
44print(f"Task ID: {task.task_id}")
45
46while True:
47 task = client.analyze_async.tasks.retrieve(task.task_id)
48 if task.status == "ready":
49 break
50 elif task.status == "failed":
51 print("Task failed")
52 break
53 time.sleep(5)
54
55data = json.loads(task.result.data)
56for segment in data["scenes"]:
57 print(f"\n[{segment['start_time']:.1f}s - {segment['end_time']:.1f}s]")
58 meta = segment["metadata"]
59 print(f" Description: {meta['description']}")
60 print(f" Camera angle: {meta['camera_angle']}")
61 print(f" Activity: {meta['activity']}")

To provide visual context for segment detection, add a media_sources array with up to 4 images to your segment definition and reference them by name in your description using angle brackets (Example: <@product_logo>).

1{
2 "id": "branded_segments",
3 "description": "Segments where <@product_logo> appears on screen",
4 "media_sources": [
5 {
6 "name": "product_logo",
7 "media_type": "image",
8 "url": "https://example.com/logo.png"
9 }
10 ],
11 "fields": [ ... ]
12}

Response format

The result.data field is a JSON-encoded string. Parse it with json.loads() in Python or JSON.parse() in Node.js before accessing the data. Every response follows this general structure:

1{
2 "<segment_definition_id>": [
3 {
4 "start_time": 0.0,
5 "end_time": 45.0,
6 "metadata": {
7 "<field_name>": "<value>",
8 "<field_name>": true,
9 "<field_name>": ["item1", "item2"]
10 }
11 }
12 ]
13}

Note the following about the general structure:

  • Top-level keys match the values of the id field from your segment definitions.
  • The start_time and end_time fields are in seconds (Example: 45.0).
  • Custom fields are nested under the metadata object for each segment.
  • Multiple definitions each produce their own top-level key in the response.
  • Segments within a definition do not overlap in time.

Example responses

These responses correspond to the examples above, trimmed to the first two segments for brevity:

1{
2 "scenes": [
3 {
4 "start_time": 0.0,
5 "end_time": 45.0,
6 "metadata": {
7 "description": "Steve Jobs walks onto the stage and begins his speech, setting the stage for the introduction of revolutionary products.",
8 "camera_angle": "wide",
9 "activity": "Steve Jobs walking onto the stage and starting his speech"
10 }
11 },
12 {
13 "start_time": 45.0,
14 "end_time": 65.0,
15 "metadata": {
16 "description": "Steve Jobs discusses the introduction of the Macintosh in 1984 and its impact on the computer industry.",
17 "camera_angle": "medium",
18 "activity": "Steve Jobs talking about the Macintosh"
19 }
20 }
21 ]
22}

Best practices

  • Write specific field descriptions: The description field controls what the platform extracts. Be specific about the expected content and format.
  • Use enum for categorical fields: When a field has a known set of values, list them with enum so that responses use only those values.
  • Start with fewer fields: Begin with 2-3 fields per segment definition and add more as needed.
  • Set segment duration constraints: Use min_segment_duration to avoid very short segments and max_segment_duration to limit how long a single segment can be.
  • Use meaningful segment definition identifiers: The id value becomes the top-level key in your response JSON, so use descriptive identifiers.
  • Handle truncated responses: If the finish_reason field in the response is length instead of stop, the JSON may be incomplete. Increase max_tokens or reduce the number of fields and definitions.
  • Keep segment definitions focused: Each definition should target one type of segment. Create separate definitions for different analysis goals.

Troubleshooting

Prompt not allowed with video segmentation

Problem: You receive a 400 error when including the prompt parameter.

Cause: The prompt parameter is not supported when the analysis_mode parameter is set to time_based_metadata. Segment definitions and field descriptions serve as instructions instead.

Solution: Remove the prompt parameter from your request. Use the description field in your segment definitions and on individual fields to specify what to extract.

Missing required field description

Problem: You receive a validation error about a missing field description.

Cause: Every field in a segment definition requires a description field that specifies what to extract.

Solution: Add a description field to every entry in your fields array:

1{
2 "name": "sentiment",
3 "type": "string",
4 "description": "The overall sentiment of this segment"
5}

Truncated or invalid JSON in response

Problem: Calls to the json.loads() function in Python or the JSON.parse() function in Node.js fail when parsing the result.data field.

Cause: The response exceeded the limit set by the max_tokens parameter or the default value of 32,768. When the finish_reason field is length, the returned JSON may be incomplete.

Solution: Increase the value of the max_tokens parameter, reduce the number of segment definitions or fields, or split your analysis across multiple requests.