Analyze videos

The platform uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.

Key features:

  • Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
  • Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
  • Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.

Use cases:

  • Content structuring: Organize and structure content for e-learning platforms to improve usability.
  • SEO optimization: Optimize content to rank higher in search engine results.
  • Highlight creation: Create short, engaging video clips for media and broadcasting.
  • Incident reporting: Record and report incidents for security and law enforcement purposes.

For details on how your usage is measured and billed, see the Pricing page.

Key concepts

This section explains the key concepts and terminology used in this guide:

  • Index: A container that organizes your video content
  • Asset: Your uploaded file
  • Indexed asset: A video that has been indexed and is ready for downstream tasks

Workflow

Upload and index your videos before you analyze them. The platform indexes videos asynchronously. After indexing completes, you can analyze your videos using custom prompts to generate summaries, extract insights, answer content-related questions, or create structured responses tailored to your specific requirements.

Customize text generation

You can customize text generation in the following ways:

  • Adjust the temperature to control the randomness of the output
  • Set the maximum token limit in the response
  • Request structured JSON responses for programmatic processing
  • Choose a response method based on your use case

Response methods

Streaming responses deliver text fragments in real-time as they are generated, enabling immediate processing and feedback. This method is the default behavior of the platform and is ideal for applications requiring incremental updates.

  • Response format: A stream of JSON objects in NDJSON format, with three event types:
    • stream_start: Marks the beginning of the stream.
    • text_generation: Delivers a fragment of the generated text.
    • stream_end: Signals the end of the stream.
  • Response handling:
    • Iterate over the stream to process text fragments as they arrive.
  • Advantages:
    • Real-time processing of partial results.
    • Reduced perceived latency.
  • Use case: Live transcription, real-time analysis, or applications needing instant updates.

Non-streaming responses deliver the complete generated text in a single response, simplifying processing when the full result is needed.

  • Response format: A single string containing the full generated text.
  • Response handling:
    • Access the complete text directly from the response.
  • Advantages:
    • Simplicity in handling the full result.
    • Immediate access to the entire text.
  • Use case: Generating reports, summaries, or any scenario where the whole text is required at once.

Prerequisites

  • To use the platform, you need an API key:

    1

    If you don’t have an account, sign up for a free account.

    2

    Go to the API Keys page.

    3

    Select the Copy icon next to your key.

  • Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:

    $pip install twelvelabs
  • Your video files must meet the following requirements:

    • For this guide: Files up to 4 GB when using publicly accessible URLs or 200 MB for local files
    • Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.

    For other upload methods with different limits, see the Upload methods page.

Complete example

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

1import time
2from twelvelabs import TwelveLabs
3
4# 1. Initialize the client
5client = TwelveLabs(api_key="<YOUR_API_KEY>")
6
7# 2. Create an index
8# An index is a container for organizing your video content
9index = client.indexes.create(
10 index_name="<YOUR_INDEX_NAME>",
11 models=[{"model_name": "pegasus1.2", "model_options": ["visual", "audio"]}]
12)
13if not index.id:
14 raise RuntimeError("Failed to create an index.")
15print(f"Created index: id={index.id}")
16
17# 3. Upload file
18asset = client.assets.create(
19 method="url",
20 url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
21 # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a file from the local file system
22)
23print(f"Created asset: id={asset.id}")
24
25# 4. Add your asset to an index
26indexed_asset = client.indexes.indexed_assets.create(
27 index_id=index.id,
28 asset_id=asset.id,
29 # enable_video_stream=True
30)
31print(f"Created indexed asset: id={indexed_asset.id}")
32
33# 5. Monitor the indexing process
34print("Waiting for indexing to complete.")
35while True:
36 indexed_asset = client.indexes.indexed_assets.retrieve(
37 index_id=index.id,
38 indexed_asset_id=indexed_asset.id
39 )
40 print(f" Status={indexed_asset.status}")
41
42 if indexed_asset.status == "ready":
43 print("Indexing complete!")
44 break
45 elif indexed_asset.status == "failed":
46 raise RuntimeError("Indexing failed")
47
48 time.sleep(5)
49
50# 6. Perform open-ended analysis
51text_stream = client.analyze_stream(
52 video_id=indexed_asset.id,
53 prompt="<YOUR_PROMPT>",
54 # temperature=0.2,
55 # max_tokens=1024,
56 # You can also use `response_format` to request structured JSON responses
57)
58
59# 7. Process the results
60for text in text_stream:
61 if text.event_type == "text_generation":
62 print(text.text)

Code explanation

1

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

  • api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

2

Create an index

Indexes store and organize your video data, allowing you to group related videos. This guide shows how to create one, but you can also use an existing index.
Function call: You call the indexes.create function.
Parameters:

  • index_name: The name of the index.
  • models: An array specifying your model configuration. This example enables the Pegasus video understanding model and specifies that it analyzes visual and audio modalities.

See the Indexes page for more details on creating an index and specifying the model configuration.

Return value: An object of type IndexesCreateResponse containing a field named id representing the unique identifier of the newly created index.

3

Upload a video

Upload a video to create an asset. For details about the available upload methods and the corresponding limits, see the Upload methods page.
Function call: You call the assets.create function.
Parameters:

  • method: The upload method for your asset. Use url for a publicly accessible or direct to upload a local file. This example uses url.
  • url or file: The publicly accessible URL of your video or an opened file object in binary read mode. This example uses url.

Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.

4

Index your video

Index your video by adding the asset created in the previous step to an index. This operation is asynchronous.
Function call: You call the indexes.indexed_assets.create function.
Parameters:

  • index_id: The unique identifier of the index to which the asset will be indexed.
  • asset_id: The unique identifier of your asset.
  • (Optional): enable_video_stream: Specifies whether the platform stores the video for streaming. When set to True, you can retrieve its URL by calling the indexes.indexed_assets.retrieve method.

Return value: An object of type IndexedAssetsCreateResponse. This object contains a field named id representing the unique identifier of your indexed asset.

5

Monitor the indexing process

The platform requires some time to index videos. Check the status of the indexing process until it’s completed.
Function call: You call the indexes.indexed_assets.retrieve function.
Parameters:

  • index_id: The unique identifier of your video index.
  • indexed_asset_id: The unique identifier of your indexed asset.

Return value: An object of type IndexedAssetDetailed containing, among other information, a field named status representing the status of the indexing process. Wait until the value of this field is ready.

6

Perform open-ended analysis

Function call: You call the analyze_stream method.
Parameters:

  • video_id: The unique identifier of the video for which you want to generate text.
  • prompt: A string that guides the model on the desired format or content. The maximum length of a prompt is 2,000 tokens.
  • (Optional) temperature: A number that controls the randomness of the text. A higher value generates more creative text, while a lower value produces more deterministic text.
  • (Optional) max_tokens: The maximum number of tokens to generate.
  • (Optional) response_format: Use this parameter to request structured JSON responses. For instructions, examples, and best practices, see the Structured responses page.

Return value: An object of type Iterator[StreamAnalyzeResponse] that handles streaming HTTP responses and provides an iterator interface allowing you to process text fragments as they arrive. The maximum length of the response is 4,096 tokens.

Note

If you encounter timeout errors, increase the timeout parameter when you initialize the TwelveLabs client. The default timeout is 60 seconds, which may not be sufficient for complex prompts, especially with non-streaming responses.

7

Process the results

Use a loop to iterate over the stream. Inside the loop, handle each text fragment as it arrives.

Notes
  • You can also request structured JSON responses. For instructions, examples, and best practices, see the Structured responses page.

  • Your prompts can be instructive or descriptive, or you can also phrase them as questions.

  • The platform generates text according to the model options enabled for your index, which determine the types of information the video understanding model processes.

    Example:

    • If both the visual and audio model options are enabled, the platform generates text based on both visual and audio information.
    • If only the visual option is enabled, the platform generates text based only on visual information.