Analyze videos
The platform uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.
Key features:
- Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
- Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
- Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.
Use cases:
- Content structuring: Organize and structure content for e-learning platforms to improve usability.
- SEO optimization: Optimize content to rank higher in search engine results.
- Highlight creation: Create short, engaging video clips for media and broadcasting.
- Incident reporting: Record and report incidents for security and law enforcement purposes.
For details on how your usage is measured and billed, see the Pricing page.
Key concepts
This section explains the key concepts and terminology used in this guide:
- Asset: Your uploaded content. Once created, you can reference the same asset across multiple operations without uploading the file again.
- Analysis task: An asynchronous operation for processing your video and generating text. Contains a status and the resulting text when complete.
Workflow
This guide shows how to upload your video as an asset and analyze it asynchronously. You can also pass a URL or base64-encoded data directly to the analysis call instead of creating an asset.
For videos under 1 hour, synchronous processing returns results immediately without polling and also supports streaming responses. For details, see the Short videos (synchronous) section.
Both modes accept the same input formats (asset ID, URL, or base64). For a full comparison, see Processing modes.
Customize text generation
You can configure the temperature to control output randomness, set the maximum token limit, and request structured JSON responses for programmatic processing.
Prerequisites
-
To use the platform, you need an API key:
-
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
-
Your video files must meet the following requirements:
- For this guide: Videos up to 2 hours (asynchronous approach). For videos under 1 hour, see the synchronous approach below.
- Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.
For upload size limits and processing modes, see the Upload and processing methods page.
Complete example
Copy and paste the code below, replacing the placeholders surrounded by <> with your values.
Code explanation
Python
Node.js
Import the SDK and initialize the client
Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:
api_key: The API key to authenticate your requests to the platform.
Return value: An object of type TwelveLabs configured for making API calls.
Upload a video
Upload a video to create an asset. For details about the available upload methods and the corresponding limits, see the Upload and processing methods page.
Function call: You call the assets.create function.
Parameters:
method: The upload method for your asset. Useurlfor a publicly accessible ordirectto upload a local file. This example usesurl.urlorfile: The publicly accessible URL of your video or an opened file object in binary read mode. This example usesurl.
Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.
Analyze your video
Create an analysis task to start processing your video. This operation is asynchronous.
Function call: You call the analyze_async.tasks.create method.
Parameters:
-
video: An object that specifies the source of the video. Provide one of the following:asset_id: The unique identifier of an asset from a previous upload.url: The publicly accessible URL of the video file.base64_string: The base64-encoded video data.
This example uses the asset ID from the previous step.
-
prompt: A string that guides the model on the desired format or content. The maximum length of a prompt is 2,000 tokens. -
(Optional)
temperature: Controls the randomness of the text output. A higher value generates more creative text, while a lower value produces more deterministic output. -
(Optional)
max_tokens: The maximum number of tokens to generate. -
(Optional)
response_format: Use this parameter to request structured JSON responses. For instructions, examples, and best practices, see the Structured responses page.
Return value: An object of type CreateAnalyzeTaskResponse containing a field named task_id, which represents the unique identifier of your analysis task. You can use this identifier to track the status of your task.
Monitor the status
The platform requires some time to process videos. Poll the status of the analysis task until processing completes. This example uses a loop to check the status every 5 seconds.
Function call: You repeatedly call the analyze_async.tasks.retrieve method until the task completes.
Parameters:
task_id: The unique identifier of your analysis task.
Return value: An object of type AnalyzeTaskResponse containing, among other information, the following fields:
status: The current status of the task. The possible values are:queued: The task is waiting to be processed.pending: The task is queued and waiting to start.processing: The platform is analyzing the video.ready: Processing is complete. Results are available in theresultfield.failed: The task failed.
result: When the status isready, this field contains the generated text and usage information.
Short videos (synchronous)
For videos that are shorter than one hour, you can use a synchronous approach that returns results immediately without creating an analysis task.
Response methods
Streaming responses
Streaming responses deliver text fragments in real-time as they are generated, enabling immediate processing and feedback. This method is the default behavior of the platform and is ideal for applications requiring incremental updates.
- Response format: A stream of JSON objects in NDJSON format, with three event types:
stream_start: Marks the beginning of the stream.text_generation: Delivers a fragment of the generated text.stream_end: Signals the end of the stream.
- Response handling:
- Iterate over the stream to process text fragments as they arrive.
- Iterate over the stream to process text fragments as they arrive.
- Advantages:
- Real-time processing of partial results.
- Reduced perceived latency.
- Use case: Live transcription, real-time analysis, or applications needing instant updates.
Non-streaming responses
Non-streaming responses deliver the complete generated text in a single response, simplifying processing when the full result is needed.
- Response format: A single string containing the full generated text.
- Response handling:
- Access the complete text directly from the response.
- Advantages:
- Simplicity in handling the full result.
- Immediate access to the entire text.
- Use case: Generating reports, summaries, or any scenario where the whole text is required at once.
Copy and paste the code below, replacing the placeholders surrounded by <> with your values.
Streaming responses
Non-streaming responses
The video parameter and all optional parameters (temperature, max_tokens, response_format) function the same as in the asynchronous approach above.