Create video embeddings

The following table lists the available models for generating video embeddings and their key characteristics:

ModelDescriptionDimensionsClip lengthSimilarity metric
Marengo-retrieval-2.7Use this model to create embeddings that you can use in various downstream tasks10242 to 10 secondsCosine similarity

The “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

Create video embeddings

To create video embeddings, you must first upload your videos, and the platform must finish processing them. Uploading and processing videos require some time. Consequently, creating embeddings is an asynchronous process comprised of three steps:

The platform allows the creation of a single embedding for the entire video and multiple embeddings for specific segments. See the Customize your embeddings section for examples of using these parameters.

Prerequisites

  • You’re familiar with the concepts that are described on the Platform overview page.
  • You have an API key. To retrieve your API key, navigate to the API Key page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.
  • The videos for which you wish to generate embeddings must meet the following requirements:
    • Video resolution: Must be at least 360x360 and must not exceed 3840x2160.
    • Aspect ratio: Must be between 1:1 and 16:9.
    • Video and audio formats: The video files must be encoded in the video and audio formats listed on the FFmpeg Formats Documentation page. For videos in other formats, contact us at support@twelvelabs.io.
    • Duration: Must be between 4 seconds and 2 hours (7,200s).
    • File size: Must not exceed 2 GB.

Procedure

Follow the steps in the sections below to create video embeddings.

1. Upload and process a video

Use the create method of the embed.task object to create a video embedding task. This method processes a single video and generates embeddings based on the specified parameters.

Notes
  • The platform supports uploading video files that can play without additional user interaction or custom video players. Ensure your URL points to the raw video file, not a web page containing the video. Links to third-party hosting sites, cloud storage services, or videos requiring extra steps to play are not supported.
  • Youtube URLs are not supported for Embed API at this time.

To upload a video from a publicly accessible URL, provide the following parameters:

  • The name of the video understanding model to be used. The examples in this section use “Marengo-retrieval-2.7”
  • The video you want to upload, provided as a publicly accessible URL.
  • (Optional) Any additional parameters for customizing the timing and length of your embeddings. See the Customize your embeddings section below for details.
1from twelvelabs import TwelveLabs
2from typing import List
3from twelvelabs.models.embed import EmbeddingsTask, SegmentEmbedding
4
5client = TwelveLabs(api_key="<YOUR_API_KEY>")
6
7task = client.embed.task.create(
8 model_name="Marengo-retrieval-2.7",
9 video_url="<YOUR_VIDEO_URL>", # Example: https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_2mb.mp4
10)
11print(
12 f"Created task: id={task.id} model_name={task.model_name} status={task.status}"
13)

The output should look similar to the following one:

Created task: id=6659784ff24ade84c6f50e8f model_name=Marengo-retrieval-2.7 status=processing

Note that the response contains a field named id, which represents the unique identifier of your video embedding task. For a description of each field in the request and response, see the API Reference > Create a video embedding task page.

2. Monitor the status of your video embedding task

The TwelveLabs Video Understanding Platform requires some time to process videos. You can retrieve the video embeddings only after the processing is complete. To monitor the status of your video embedding task, call the wait_for_done method of the task object with the following parameters:

  • sleep_interval: A number specifying the time interval, in seconds, between successive status checks. In this example, the method checks the status every two seconds. Adjust this value to control how frequently the method checks the status.
  • callback: A callback function that the SDK executes each time it checks the status. In this example, on_task_update is the callback function. Note that the callback function takes a parameter of type EmbeddingsTask. Use this parameter to display the status of your video processing task.
1def on_task_update(task: EmbeddingsTask):
2 print(f" Status={task.status}")
3
4status = task.wait_for_done(
5 sleep_interval=2,
6 callback=on_task_update
7 )
8print(f"Embedding done: {status}")

The output should look similar to the following one:

Status=processing
Status=processing
Status=ready

After a video has been successfully uploaded and processed, the task object contains, among other information, a field named id, representing the unique identifier of your video embedding task. For a description of each field in the response, see the API Reference > Retrieve the status of a video embedding task page.

3. Retrieve the embeddings

Once the platform has finished processing your video, you can retrieve the embeddings by invoking the retrieve method of the task object:

1def print_segments(segments: List[SegmentEmbedding]):
2 for segment in segments:
3 print(
4 f" embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
5 )
6 print(f" embeddings: {', '.join(str(segment.embeddings_float))}")
7
8task = task.retrieve()
9if task.video_embedding is not None and task.video_embedding.segments is not None:
10 print_segments(task.video_embedding.segments)

Note the following about the response:

  • When you use the default behavior of the platform, and no additional parameters are specified, the response should look similar to the following one:

    embedding_scope=clip start_offset_sec=0.0 end_offset_sec=6.0
    embeddings: -0.06261667, -0.012716668, 0.024836386
    ...
    embedding_scope=clip start_offset_sec=6.0 end_offset_sec=12.0
    mbeddings: -0.050863456, -0.014198959, 0.038503144
    ...
    embedding_scope=clip start_offset_sec=156.0 end_offset_sec=160.52
    embeddings: -0.00094736926, -0.010648306, 0.054438476
    ...

    In this example response, each object of the video_embeddings array corresponds to a segment and includes the following fields:

    • start_offset_sec: Start time of the segment.
    • end_offset_sec: End time of the segment.
    • embedding_scope: The value of the embedding_scope field is set to clip. This specifies that the embedding is for a clip.
    • values: An array of floats that represents the embedding.
  • When you create embeddings for specific video clips and the entire video simultaneously by setting the value of the video_embedding_scopes parameter to ["clip", "video"], the response should look similar to the following one:

    embedding_scope=clip start_offset_sec=0.0 end_offset_sec=6.0
    embeddings: -0.06261667, -0.012716668, 0.024836386
    ...
    embedding_scope=clip start_offset_sec=6.0 end_offset_sec=12.0
    mbeddings: -0.050863456, -0.014198959, 0.038503144
    ...
    embedding_scope=clip start_offset_sec=156.0 end_offset_sec=160.52
    embeddings: -0.00094736926, -0.010648306, 0.054438476
    ...
    embedding_scope=video start_offset_sec=0.0 end_offset_sec=160.52
    embeddings: -0.023929736, -0.012013472, 0.043946236

    Note the following about this example response:

    • The first three embeddings have the embedding_scope field set to clip. Each corresponds to a specific segment of the video you provided.
    • The fourth embedding has the embedding_scope field set to video. This embedding corresponds to the entire video.

For a description of each field in the request and response, see the API Reference > Retrieve video embeddings page.

Customize your embeddings

The default behavior is to create multiple embeddings, each 6 seconds long, for each video. You can modify the default behavior as follows:

The optional video_embedding_scopes parameter determines the scope of the generated embeddings. It is an array of strings, and valid values are the following:

  • ["clip"]: Creates embeddings for multiple clips, as specified by the video_start_offset_sec, video_end_offset_sec, video_clip_length parameters described below. This is the default value.
  • ["clip", "video"]: Creates embeddings for specific video segments and the entire video in a single request.

The following optional parameters customize the timing and length of the embeddings:

  • video_start_offset_sec: Specifies the start offset in seconds from the beginning of the video where processing should begin.
  • video_end_offset_sec: Specifies the end offset in seconds from the beginning of the video where processing should end.
  • video_clip_length: Specifies the desired duration in seconds for each clip for which the platform generates an embedding. It can be between 2 and 10 seconds.

Note that the platform automatically truncates video segments shorter than 2 seconds. For a 31-second video divided into 6-second segments, the final 1-second segment will be truncated. This truncation only applies to the last segment if it does not meet the minimum length requirement of 2 seconds.

Below are examples of how you can customize the timing and length of your embeddings:

  • To split the video into multiple 5-second segments and create an embedding for each:

    1task = client.embed.task.create(
    2 model_name="Marengo-retrieval-2.7",
    3 video_url="<YOUR_VIDEO_URL>",
    4 video_clip_length=5
    5)
  • To split a video into multiple 5-second segments from the 30-second mark to the 60-second mark and create an embedding for each:

    1task = client.embed.task.create(
    2 model_name="Marengo-retrieval-2.7",
    3 video_url="<YOUR_VIDEO_URL>",
    4 video_clip_length=5,
    5 video_start_offset_sec=30,
    6 video_end_offset_sec=60,
    7)
  • To create a single embedding for a video segment from the 2-second mark to the 12-second mark:

    1task = client.embed.task.create(
    2 model_name="Marengo-retrieval-2.7",
    3 video_url="<YOUR_VIDEO_URL>,
    4 video_start_offset_sec=2,
    5 video_end_offset_sec= 12,
    6 video_embedding_scopes=["video"]
    7)
  • To split the video into multiple 6-second segments and create embeddings for each segment as well as the entire video, set the value of the video_embedding_scopes parameter to ["clip", "video"]:

    1task = client.embed.task.create(
    2 model_name="Marengo-retrieval-2.7",
    3 video_url="<YOUR_VIDEO_URL>",
    4 video_embedding_scopes=["clip", "video"]
    5)

Retrieve embeddings for indexed videos

The platform allows you to retrieve embeddings for videos you’ve already uploaded and indexed. The embeddings are generated using video scene detection. Video scene detection enables the segmentation of videos into semantically meaningful parts. It involves identifying boundaries between scenes, defined as a series of frames depicting a continuous action or theme. Each segment is between 2 and 10 seconds.

Prerequisites

Your video must be indexed with the Marengo video understanding model version 2.7 or later. For details on enabling this model for an index, see the Create indexes page.

Procedure

Call the retrieve method of the index.video object with the following parameters:

  • index_id: The unique identifier of your index.
  • video_id: The unique identifier of your video.
  • embed: Set this parameter to True to retrieve the embeddings.
1from twelvelabs import TwelveLabs
2from typing import List
3from twelvelabs.models.embed import SegmentEmbedding
4
5def print_segments(segments: List[SegmentEmbedding], max_elements: int = 5):
6 for segment in segments:
7 print(
8 f" embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
9 )
10 print(f" embeddings: {segment.embeddings_float[:max_elements]}")
11
12client = TwelveLabs(api_key="<YOUR_API_KEY>")
13
14video = client.index.video.retrieve(
15 index_id="<YOUR_INDEX_ID>", id="<YOUR_VIDEO_ID>", embed=True)
16if video.embedding:
17 print(f"Model_name={video.embedding.model_name}")
18 print("Embeddings:")
19 print_segments(video.embedding.video_embedding.segments)

For details about each field in the response, see the Retrieve video information page.

Built with