The platform uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.

Key features:

Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.
Segment videos: Extract structured, timestamped segments from your videos by defining custom segment types and fields.

Use cases:

Content structuring: Organize and structure content for e-learning platforms to improve usability.
SEO optimization: Optimize content to rank higher in search engine results.
Highlight creation: Create short, engaging video clips for media and broadcasting.
Incident reporting: Record and report incidents for security and law enforcement purposes.
Generative video QA: Check AI-generated clips against their prompt or flag visual defects before you use them.

On the Free plan, analyzed video hours count toward a shared limit that also covers indexing. On paid plans, you pay based on how much video you process and how many segment definitions you include — see the Frequently asked questions page for examples.

For details on how your usage is measured and billed, see the Pricing page.

Key concepts

This section explains the key concepts and terminology used in this guide:

Asset: Your uploaded content. Once created, you can reference the same asset across multiple operations without uploading the file again.
Analysis task: An asynchronous operation for processing your video and generating text. Contains a status and the resulting text when complete.

Workflow

This guide shows how to upload your video as an asset and analyze it asynchronously. You can also pass a URL or base64-encoded data directly to the analysis call instead of creating an asset.

For videos under 1 hour, synchronous processing returns results immediately without polling and also supports streaming responses. For an example, see the Short videos (synchronous) section. Both modes accept the same input formats (asset ID, URL, or base64). For a full comparison, see Processing modes.

Customize text generation

You can configure the temperature to control output randomness, set the maximum response length, and request structured JSON responses for programmatic processing. To extract timestamped segments with custom fields, see the Segment videos page.

Prerequisites

To use the platform, you need an API key:

1
If you don’t have an account, sign up for a free account.
2
Go to the API Keys page.
3
If you need to create a new key, select the Create API Key button. Enter a name and set the expiration period. The default is 12 months.
4
Select the Copy icon next to your key to copy it to your clipboard.
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
```
$ pip install twelvelabs
```
Your video files must meet the following requirements:
- Upload limits: Public video URLs up to 2 GB or local video files up to 200 MB. For local files up to 2 GB, see the Upload and processing methods page.
- Analysis method: Videos up to 2 hours (asynchronous approach). For videos under 1 hour, see the synchronous approach below.
- Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.

Complete example

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

1 import time
2 from twelvelabs import TwelveLabs
3 from twelvelabs.types import VideoContext_AssetId, VideoContext_Url, VideoContext_Base64String, AnalyzePromptV2, SmeMediaSource
4 
5 # 1. Initialize the client
6 client = TwelveLabs(api_key="<YOUR_API_KEY>")
7 
8 # 2. Upload a video
9 asset = client.assets.create(
10     method="url",
11     url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
12     # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a local file up to 200 MB
13 )
14 print(f"Created asset: id={asset.id}")
15 
16 # 3. Check the status of the asset
17 print("Waiting for asset to be ready...")
18 while True:
19     asset = client.assets.retrieve(asset.id)
20     if asset.status == "ready":
21         print("Asset is ready")
22         break
23     if asset.status == "failed":
24         raise RuntimeError(f"Asset processing failed: id={asset.id}")
25     time.sleep(5)
26 
27 # 4. Analyze your video
28 video = VideoContext_AssetId(asset_id=asset.id)
29 # Or instead of creating an asset, pass video inline:
30 # video = VideoContext_Url(url="<YOUR_VIDEO_URL>")
31 # video = VideoContext_Base64String(base64_string="<YOUR_BASE64_DATA>")
32 task = client.analyze_async.tasks.create(
33     model_name="pegasus1.5",
34     video=video,
35     prompt_v_2=AnalyzePromptV2(
36         input_text="<YOUR_PROMPT>",  # To use reference images: "Is there a <@product> in this video?"
37         # media_sources=[
38         #     SmeMediaSource(name="product", media_type="image", url="<YOUR_IMAGE_URL>"),
39         # ],
40     ),
41     # temperature=0.2,
42     # max_tokens=1024,
43     # You can also use `response_format` to request structured JSON responses
44 )
45 print(f"Task ID: {task.task_id}")
46 
47 # 5. Monitor the status
48 while True:
49     task = client.analyze_async.tasks.retrieve(task.task_id)
50 
51     if task.status == "ready":
52         print("Task completed")
53         break
54     elif task.status == "failed":
55         print("Task failed")
56         break
57     else:
58         print("Task still processing...")
59         time.sleep(5)
60 
61 # 6. Process the results
62 print(f"{task.result.data}")

Code explanation

Python

Node.js

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

Upload a video

Upload a video to create an asset.
Function call: You call the assets.create function.
Parameters:

method: The upload method for your asset. Use url for a publicly accessible or direct to upload a local file. This example uses url.
url or file: The publicly accessible URL of your video or an opened file object in binary read mode. This example uses url.

Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.

Note

For local files larger than 200 MB, use multipart uploads. Multipart uploads support automatic retry, progress tracking, parallel chunk uploads, and improved reliability, performance, and observability.

Check the status of the asset

Asset processing is asynchronous. Poll the status of the asset until it is ready before you use it.
Function call: You call the assets.retrieve function.
Parameters:

asset_id: The unique identifier of your asset.

Return value: An object of type Asset containing, among other information, a field named status representing the current status of the asset. Check this field until its value is ready.

Analyze your video

Create an analysis task to start processing your video. This operation is asynchronous.
Function call: You call the analyze_async.tasks.create method.
Parameters:

video: An object that specifies the source of the video. Provide one of the following:
- asset_id: The unique identifier of an asset from a previous upload.
- url: The publicly accessible URL of the video file.
- base64_string: The base64-encoded video data.
This example uses the asset ID from the previous step.
prompt_v_2: A structured prompt. Set input_text to your prompt text. To include reference images, add entries to media_sources and use <@name> placeholders in input_text. See the commented lines in the code example above.
(Optional) temperature: Controls the randomness of the text output. A higher value generates more creative text, while a lower value produces more deterministic output.
(Optional) max_tokens: The maximum response length, in tokens.
(Optional) response_format: Use this parameter to request structured JSON responses. For instructions, examples, and best practices, see the Structured responses page.

Return value: An object of type CreateAnalyzeTaskResponse containing a field named task_id, which represents the unique identifier of your analysis task. You can use this identifier to track the status of your task.

Monitor the status

The platform requires some time to process videos. Poll the status of the analysis task until processing completes. This example uses a loop to check the status every 5 seconds.
Function call: You repeatedly call the analyze_async.tasks.retrieve method until the task completes.

Parameters:

task_id: The unique identifier of your analysis task.

Return value: An object of type AnalyzeTaskResponse containing, among other information, the following fields:

status: The current status of the task. The possible values are:
- queued: The task is waiting to be processed.
- pending: The task is queued and waiting to start.
- processing: The platform is analyzing the video.
- ready: Processing is complete. Results are available in the result field.
- failed: The task failed.
result: When the status is ready, this field contains the generated text and usage information.

Process the results

This example prints the generated text to the standard output.

Short videos (synchronous)

For videos that are shorter than one hour, you can use a synchronous approach that returns results immediately without creating an analysis task. The sync endpoint supports both Pegasus 1.2 and Pegasus 1.5.

Response methods

Streaming responses

Streaming responses deliver text fragments in real-time as they are generated, enabling immediate processing and feedback. This method is the default behavior of the platform and is ideal for applications requiring incremental updates.

Response format: A stream of JSON objects in NDJSON format, with three event types:
- stream_start: Marks the beginning of the stream.
- text_generation: Delivers a fragment of the generated text.
- stream_end: Signals the end of the stream.
Response handling:
- Iterate over the stream to process text fragments as they arrive.
Advantages:
- Real-time processing of partial results.
- Reduced perceived latency.
Use case: Live transcription, real-time analysis, or applications needing instant updates.

Non-streaming responses

Non-streaming responses deliver the complete generated text in a single response, simplifying processing when the full result is needed.

Response format: A single string containing the full generated text.
Response handling:
- Access the complete text directly from the response.
Advantages:
- Simplicity in handling the full result.
- Immediate access to the entire text.
Use case: Generating reports, summaries, or any scenario where the whole text is required at once.

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

Streaming responses

Non-streaming responses

1 import time
2 from twelvelabs import TwelveLabs
3 from twelvelabs.types import VideoContext_AssetId, AnalyzePromptV2, SmeMediaSource
4 
5 # 1. Initialize the client
6 client = TwelveLabs(api_key="<YOUR_API_KEY>")
7 
8 # 2. Upload a video
9 asset = client.assets.create(
10     method="url",
11     url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
12     # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a local file up to 200 MB
13 )
14 print(f"Created asset: id={asset.id}")
15 
16 # 3. Check the status of the asset
17 print("Waiting for asset to be ready...")
18 while True:
19     asset = client.assets.retrieve(asset.id)
20     if asset.status == "ready":
21         print("Asset is ready")
22         break
23     if asset.status == "failed":
24         raise RuntimeError(f"Asset processing failed: id={asset.id}")
25     time.sleep(5)
26 
27 # 4. Analyze your video
28 video = VideoContext_AssetId(asset_id=asset.id)
29 text_stream = client.analyze_stream(
30     model_name="pegasus1.5",
31     video=video,
32     prompt_v_2=AnalyzePromptV2(
33         input_text="<YOUR_PROMPT>",  # To use reference images: "Is there a <@product> in this video?"
34         # media_sources=[
35         #     SmeMediaSource(name="product", media_type="image", url="<YOUR_IMAGE_URL>"),
36         # ],
37     ),
38     # temperature=0.2,
39     # max_tokens=1024,
40     # You can also use `response_format` to request structured JSON responses
41 )
42 
43 # 5. Process the results
44 for text in text_stream:
45     if text.event_type == "text_generation":
46         print(text.text)

The video parameter and all optional parameters (model_name, temperature, max_tokens, response_format, etc) function the same as in the asynchronous approach above.

Troubleshooting

Truncated responses

When the response reaches the maximum response length or the context window, the platform returns the partial output and sets finish_reason to "length". A warning appears in the error field. This can happen for two reasons:

The response reached the maximum response length. To fix this, increase the max_tokens value.
The combined input and response reached the context window. To fix this, reduce the input (shorter prompt, shorter video clip, fewer reference images) or lower the max_tokens value.

Check usage.input_tokens and usage.output_tokens in the response to determine which limit was reached. For truncation error messages, see Error codes.

Note that Pegasus 1.2 does not use a context window. The maximum response length is 4,096 tokens and the prompt limit is 2,000 tokens.

Stay within the context window

Reduce input size:

Keep prompts focused, especially for longer videos.
Use start_time and end_time to analyze a portion of the video.
Reduce the number or size of reference images.

Control output size:

Set the max_tokens parameter to the length your application needs.
Request only the fields your application requires.

Split large tasks:

Break complex analysis into multiple requests.