Analyze videos

The platform uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.

Key features:

  • Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
  • Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
  • Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.

Use cases:

  • Content structuring: Organize and structure content for e-learning platforms to improve usability.
  • SEO optimization: Optimize content to rank higher in search engine results.
  • Highlight creation: Create short, engaging video clips for media and broadcasting.
  • Incident reporting: Record and report incidents for security and law enforcement purposes.

For details on how your usage is measured and billed, see the Pricing page.

Key concepts

This section explains the key concepts and terminology used in this guide:

  • Asset: Your uploaded content. Once created, you can reference the same asset across multiple operations without uploading the file again.
  • Analysis task: An asynchronous operation for processing your video and generating text. Contains a status and the resulting text when complete.

Workflow

This guide shows how to upload your video as an asset and analyze it asynchronously. You can also pass a URL or base64-encoded data directly to the analysis call instead of creating an asset.

For videos under 1 hour, synchronous processing returns results immediately without polling and also supports streaming responses. For details, see the Short videos (synchronous) section.

Both modes accept the same input formats (asset ID, URL, or base64). For a full comparison, see Processing modes.

Customize text generation

You can configure the temperature to control output randomness, set the maximum token limit, and request structured JSON responses for programmatic processing.

Prerequisites

  • To use the platform, you need an API key:

    1

    If you don’t have an account, sign up for a free account.

    2

    Go to the API Keys page.

    3

    If you need to create a new key, select the Create API Key button. Enter a name and set the expiration period. The default is 12 months.

    4

    Select the Copy icon next to your key to copy it to your clipboard.

  • Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:

    $pip install twelvelabs
  • Your video files must meet the following requirements:

    • For this guide: Videos up to 2 hours (asynchronous approach). For videos under 1 hour, see the synchronous approach below.
    • Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.

    For upload size limits and processing modes, see the Upload and processing methods page.

Complete example

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

1import time
2from twelvelabs import TwelveLabs
3from twelvelabs.types import VideoContext_AssetId, VideoContext_Url, VideoContext_Base64String
4
5# 1. Initialize the client
6client = TwelveLabs(api_key="<YOUR_API_KEY>")
7
8# 2. Upload a video
9asset = client.assets.create(
10 method="url",
11 url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
12 # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a file from the local file system
13)
14print(f"Created asset: id={asset.id}")
15
16# 3. Analyze your video
17video = VideoContext_AssetId(asset_id=asset.id)
18# Or instead of creating an asset, pass video inline:
19# video = VideoContext_Url(url="<YOUR_VIDEO_URL>")
20# video = VideoContext_Base64String(base64_string="<YOUR_BASE64_DATA>")
21task = client.analyze_async.tasks.create(
22 video=video,
23 prompt="<YOUR_PROMPT>",
24 # temperature=0.2,
25 # max_tokens=1024,
26 # You can also use `response_format` to request structured JSON responses
27)
28print(f"Task ID: {task.task_id}")
29
30# 4. Monitor the status
31while True:
32 task = client.analyze_async.tasks.retrieve(task.task_id)
33
34 if task.status == "ready":
35 print("Task completed")
36 break
37 elif task.status == "failed":
38 print("Task failed")
39 break
40 else:
41 print("Task still processing...")
42 time.sleep(5)
43
44# 5. Process the results
45print(f"{task.result.data}")

Code explanation

1

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

  • api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

2

Upload a video

Upload a video to create an asset. For details about the available upload methods and the corresponding limits, see the Upload and processing methods page.
Function call: You call the assets.create function.
Parameters:

  • method: The upload method for your asset. Use url for a publicly accessible or direct to upload a local file. This example uses url.
  • url or file: The publicly accessible URL of your video or an opened file object in binary read mode. This example uses url.

Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.

3

Analyze your video

Create an analysis task to start processing your video. This operation is asynchronous.
Function call: You call the analyze_async.tasks.create method.
Parameters:

  • video: An object that specifies the source of the video. Provide one of the following:

    • asset_id: The unique identifier of an asset from a previous upload.
    • url: The publicly accessible URL of the video file.
    • base64_string: The base64-encoded video data.

    This example uses the asset ID from the previous step.

  • prompt: A string that guides the model on the desired format or content. The maximum length of a prompt is 2,000 tokens.

  • (Optional) temperature: Controls the randomness of the text output. A higher value generates more creative text, while a lower value produces more deterministic output.

  • (Optional) max_tokens: The maximum number of tokens to generate.

  • (Optional) response_format: Use this parameter to request structured JSON responses. For instructions, examples, and best practices, see the Structured responses page.

Return value: An object of type CreateAnalyzeTaskResponse containing a field named task_id, which represents the unique identifier of your analysis task. You can use this identifier to track the status of your task.

4

Monitor the status

The platform requires some time to process videos. Poll the status of the analysis task until processing completes. This example uses a loop to check the status every 5 seconds.
Function call: You repeatedly call the analyze_async.tasks.retrieve method until the task completes.

Parameters:

  • task_id: The unique identifier of your analysis task.

Return value: An object of type AnalyzeTaskResponse containing, among other information, the following fields:

  • status: The current status of the task. The possible values are:
    • queued: The task is waiting to be processed.
    • pending: The task is queued and waiting to start.
    • processing: The platform is analyzing the video.
    • ready: Processing is complete. Results are available in the result field.
    • failed: The task failed.
  • result: When the status is ready, this field contains the generated text and usage information.
5

Process the results

This example prints the generated text to the standard output.

Short videos (synchronous)

For videos that are shorter than one hour, you can use a synchronous approach that returns results immediately without creating an analysis task.

Response methods

Streaming responses deliver text fragments in real-time as they are generated, enabling immediate processing and feedback. This method is the default behavior of the platform and is ideal for applications requiring incremental updates.

  • Response format: A stream of JSON objects in NDJSON format, with three event types:
    • stream_start: Marks the beginning of the stream.
    • text_generation: Delivers a fragment of the generated text.
    • stream_end: Signals the end of the stream.
  • Response handling:
    • Iterate over the stream to process text fragments as they arrive.
  • Advantages:
    • Real-time processing of partial results.
    • Reduced perceived latency.
  • Use case: Live transcription, real-time analysis, or applications needing instant updates.

Non-streaming responses deliver the complete generated text in a single response, simplifying processing when the full result is needed.

  • Response format: A single string containing the full generated text.
  • Response handling:
    • Access the complete text directly from the response.
  • Advantages:
    • Simplicity in handling the full result.
    • Immediate access to the entire text.
  • Use case: Generating reports, summaries, or any scenario where the whole text is required at once.

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

1from twelvelabs import TwelveLabs
2from twelvelabs.types import VideoContext_AssetId
3
4# 1. Initialize the client
5client = TwelveLabs(api_key="<YOUR_API_KEY>")
6
7# 2. Upload a video
8asset = client.assets.create(
9 method="url",
10 url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
11 # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a file from the local file system
12)
13print(f"Created asset: id={asset.id}")
14
15# 3. Analyze your video
16video = VideoContext_AssetId(asset_id=asset.id)
17text_stream = client.analyze_stream(
18 video=video,
19 prompt="<YOUR_PROMPT>",
20 # temperature=0.2,
21 # max_tokens=1024,
22 # You can also use `response_format` to request structured JSON responses
23)
24
25# 4. Process the results
26for text in text_stream:
27 if text.event_type == "text_generation":
28 print(text.text)

The video parameter and all optional parameters (temperature, max_tokens, response_format) function the same as in the asynchronous approach above.