Create video embeddings
The following table lists the available models for generating video embeddings and their key characteristics:
The “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
Create video embeddings
To create video embeddings, you must first upload your videos, and the platform must finish processing them. Uploading and processing videos require some time. Consequently, creating embeddings is an asynchronous process comprised of three steps:
Create a video embedding task that uploads and processes your video. The platform returns the unique identifier of your task.
Use the unique identifier of your task to check its status periodically until it’s completed.
After the video embedding task is completed, retrieve the video embeddings by providing the task identifier.
The platform allows the creation of a single embedding for the entire video and multiple embeddings for specific segments. See the Customize your embeddings section for examples of using these parameters.
Prerequisites
- You’re familiar with the concepts that are described on the Platform overview page.
- You have an API key. To retrieve your API key, navigate to the API Key page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.
- The videos for which you wish to generate embeddings must meet the following requirements:
- Video resolution: Must be at least 360x360 and must not exceed 3840x2160.
- Aspect ratio: Must be between 1:1 and 16:9.
- Video and audio formats: The video files must be encoded in the video and audio formats listed on the FFmpeg Formats Documentation page. For videos in other formats, contact us at support@twelvelabs.io.
- Duration: Must be between 4 seconds and 2 hours (7,200s).
- File size: Must not exceed 2 GB.
Procedure
Follow the steps in the sections below to create video embeddings.
1. Upload and process a video
Use the create
method of the embed.task
object to create a video embedding task. This method processes a single video and generates embeddings based on the specified parameters.
Publicly accessible URL
Local file system
Notes
- The platform supports uploading video files that can play without additional user interaction or custom video players. Ensure your URL points to the raw video file, not a web page containing the video. Links to third-party hosting sites, cloud storage services, or videos requiring extra steps to play are not supported.
- Youtube URLs are not supported for Embed API at this time.
To upload a video from a publicly accessible URL, provide the following parameters:
- The name of the video understanding model to be used. The examples in this section use “Marengo-retrieval-2.7”
- The video you want to upload, provided as a publicly accessible URL.
- (Optional) Any additional parameters for customizing the timing and length of your embeddings. See the Customize your embeddings section below for details.
The output should look similar to the following one:
Note that the response contains a field named id
, which represents the unique identifier of your video embedding task. For a description of each field in the request and response, see the API Reference > Create a video embedding task page.
2. Monitor the status of your video embedding task
The TwelveLabs Video Understanding Platform requires some time to process videos. You can retrieve the video embeddings only after the processing is complete. To monitor the status of your video embedding task, call the wait_for_done
method of the task
object with the following parameters:
sleep_interval
: A number specifying the time interval, in seconds, between successive status checks. In this example, the method checks the status every two seconds. Adjust this value to control how frequently the method checks the status.callback
: A callback function that the SDK executes each time it checks the status. In this example,on_task_update
is the callback function. Note that the callback function takes a parameter of typeEmbeddingsTask
. Use this parameter to display the status of your video processing task.
The output should look similar to the following one:
After a video has been successfully uploaded and processed, the task
object contains, among other information, a field named id
, representing the unique identifier of your video embedding task. For a description of each field in the response, see the API Reference > Retrieve the status of a video embedding task page.
3. Retrieve the embeddings
Once the platform has finished processing your video, you can retrieve the embeddings by invoking the retrieve
method of the task
object:
Note the following about the response:
-
When you use the default behavior of the platform, and no additional parameters are specified, the response should look similar to the following one:
In this example response, each object of the
video_embeddings
array corresponds to a segment and includes the following fields:start_offset_sec
: Start time of the segment.end_offset_sec
: End time of the segment.embedding_scope
: The value of theembedding_scope
field is set toclip
. This specifies that the embedding is for a clip.values
: An array of floats that represents the embedding.
-
When you create embeddings for specific video clips and the entire video simultaneously by setting the value of the
video_embedding_scopes
parameter to["clip", "video"]
, the response should look similar to the following one:Note the following about this example response:
- The first three embeddings have the
embedding_scope
field set toclip
. Each corresponds to a specific segment of the video you provided. - The fourth embedding has the
embedding_scope
field set tovideo
. This embedding corresponds to the entire video.
- The first three embeddings have the
For a description of each field in the request and response, see the API Reference > Retrieve video embeddings page.
Customize your embeddings
The default behavior is to create multiple embeddings, each 6 seconds long, for each video. You can modify the default behavior as follows:
Embedding scope
The optional video_embedding_scopes
parameter determines the scope of the generated embeddings. It is an array of strings, and valid values are the following:
["clip"]
: Creates embeddings for multiple clips, as specified by thevideo_start_offset_sec
,video_end_offset_sec
,video_clip_length
parameters described below. This is the default value.["clip", "video"]
: Creates embeddings for specific video segments and the entire video in a single request.
Embedding settings
The following optional parameters customize the timing and length of the embeddings:
video_start_offset_sec
: Specifies the start offset in seconds from the beginning of the video where processing should begin.video_end_offset_sec
: Specifies the end offset in seconds from the beginning of the video where processing should end.video_clip_length
: Specifies the desired duration in seconds for each clip for which the platform generates an embedding. It can be between 2 and 10 seconds.
Note that the platform automatically truncates video segments shorter than 2 seconds. For a 31-second video divided into 6-second segments, the final 1-second segment will be truncated. This truncation only applies to the last segment if it does not meet the minimum length requirement of 2 seconds.
Below are examples of how you can customize the timing and length of your embeddings:
-
To split the video into multiple 5-second segments and create an embedding for each:
-
To split a video into multiple 5-second segments from the 30-second mark to the 60-second mark and create an embedding for each:
-
To create a single embedding for a video segment from the 2-second mark to the 12-second mark:
-
To split the video into multiple 6-second segments and create embeddings for each segment as well as the entire video, set the value of the
video_embedding_scopes
parameter to["clip", "video"]
:
Retrieve embeddings for indexed videos
The platform allows you to retrieve embeddings for videos you’ve already uploaded and indexed. The embeddings are generated using video scene detection. Video scene detection enables the segmentation of videos into semantically meaningful parts. It involves identifying boundaries between scenes, defined as a series of frames depicting a continuous action or theme. Each segment is between 2 and 10 seconds.
Prerequisites
Your video must be indexed with the Marengo video understanding model version 2.7 or later. For details on enabling this model for an index, see the Create indexes page.
Procedure
Call the retrieve
method of the index.video
object with the following parameters:
index_id
: The unique identifier of your index.video_id
: The unique identifier of your video.embed
: Set this parameter toTrue
to retrieve the embeddings.
For details about each field in the response, see the Retrieve video information page.