The platform uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.
Key features:
Use cases:
On the Free plan, analyzed video hours count toward a shared limit that also covers indexing. On paid plans, you pay based on how much video you process and how many segment definitions you include — see the Frequently asked questions page for examples.
For details on how your usage is measured and billed, see the Pricing page.
This section explains the key concepts and terminology used in this guide:
This guide shows how to upload your video as an asset and analyze it asynchronously. You can also pass a URL or base64-encoded data directly to the analysis call instead of creating an asset.
For videos under 1 hour, synchronous processing returns results immediately without polling and also supports streaming responses. For an example, see the Short videos (synchronous) section. Both modes accept the same input formats (asset ID, URL, or base64). For a full comparison, see Processing modes.
Customize text generation
You can configure the temperature to control output randomness, set the maximum token limit, and request structured JSON responses for programmatic processing. To extract timestamped segments with custom fields, see the Segment videos page.
To use the platform, you need an API key:
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
Your video files must meet the following requirements:
Upload limits: Public video URLs up to 2 GB or local video files up to 200 MB. For local files up to 2 GB, see the Upload and processing methods page.
Analysis method: Videos up to 2 hours (asynchronous approach). For videos under 1 hour, see the synchronous approach below.
Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.
Copy and paste the code below, replacing the placeholders surrounded by <> with your values.
Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:
api_key: The API key to authenticate your requests to the platform.Return value: An object of type TwelveLabs configured for making API calls.
Upload a video to create an asset.
Function call: You call the assets.create function.
Parameters:
method: The upload method for your asset. Use url for a publicly accessible or direct to upload a local file. This example uses url.url or file: The publicly accessible URL of your video or an opened file object in binary read mode. This example uses url.Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.
For local files larger than 200 MB, use multipart uploads. Multipart uploads support automatic retry, progress tracking, parallel chunk uploads, and improved reliability, performance, and observability.
You only need this step for URL uploads larger than 200 MB. The platform processes these files asynchronously.
Function call: You call the assets.retrieve function.
Parameters:
asset_id: The unique identifier of your asset.Return value: An object of type Asset containing, among other information, a field named status representing the current status of the asset. Check this field until its value is ready.
Create an analysis task to start processing your video. This operation is asynchronous.
Function call: You call the analyze_async.tasks.create method.
Parameters:
video: An object that specifies the source of the video. Provide one of the following:
asset_id: The unique identifier of an asset from a previous upload.url: The publicly accessible URL of the video file.base64_string: The base64-encoded video data.This example uses the asset ID from the previous step.
prompt_v_2: A structured prompt. Set input_text to your prompt text (max 2,000 tokens). To include reference images, add entries to media_sources and use <@name> placeholders in input_text. See the commented lines in the code example above.
(Optional) temperature: Controls the randomness of the text output. A higher value generates more creative text, while a lower value produces more deterministic output.
(Optional) max_tokens: The maximum number of tokens to generate.
(Optional) response_format: Use this parameter to request structured JSON responses. For instructions, examples, and best practices, see the Structured responses page.
Return value: An object of type CreateAnalyzeTaskResponse containing a field named task_id, which represents the unique identifier of your analysis task. You can use this identifier to track the status of your task.
The platform requires some time to process videos. Poll the status of the analysis task until processing completes. This example uses a loop to check the status every 5 seconds.
Function call: You repeatedly call the analyze_async.tasks.retrieve method until the task completes.
Parameters:
task_id: The unique identifier of your analysis task.Return value: An object of type AnalyzeTaskResponse containing, among other information, the following fields:
status: The current status of the task. The possible values are:
queued: The task is waiting to be processed.pending: The task is queued and waiting to start.processing: The platform is analyzing the video.ready: Processing is complete. Results are available in the result field.failed: The task failed.result: When the status is ready, this field contains the generated text and usage information.For videos that are shorter than one hour, you can use a synchronous approach that returns results immediately without creating an analysis task. The sync endpoint supports both Pegasus 1.2 and Pegasus 1.5.
Response methods
Streaming responses deliver text fragments in real-time as they are generated, enabling immediate processing and feedback. This method is the default behavior of the platform and is ideal for applications requiring incremental updates.
stream_start: Marks the beginning of the stream.text_generation: Delivers a fragment of the generated text.stream_end: Signals the end of the stream.Non-streaming responses deliver the complete generated text in a single response, simplifying processing when the full result is needed.
Copy and paste the code below, replacing the placeholders surrounded by <> with your values.
The video parameter and all optional parameters (model_name, temperature, max_tokens, response_format, start_time, end_time) function the same as in the asynchronous approach above. The start_time and end_time parameters require model_name set to "pegasus1.5".