Create video embeddings

Note that the private beta version will only process visual information, including audio tracks like ambient sounds, but will exclude spoken words (human speech).

To create video embeddings, you must first upload your videos, and the platform must finish processing them. Uploading and processing videos require some time. Consequently, creating embeddings is an asynchronous process comprised of three steps:

  1. Upload and process a video: When you start uploading a video, the platform creates a video embedding task and returns its unique task identifier.
  2. Monitor the status of your video embedding task: Check the status of the task periodically until it shows as ready. The task status indicates when the video processing is complete, and the embeddings are available for retrieval.
  3. Retrieve the embeddings: After the task status changes to ready, retrieve the video embeddings by providing the task identifier.

The platform allows the creation of a single embedding for the entire video and multiple embeddings for specific segments. The default behavior is to create multiple embeddings, each 6 seconds long, for each video. You can modify the default behavior as follows:

  • Embedding scope: The optional video_embedding_scope parameter determines the scope of the generated embeddings, and it can have the following values:
    • video: Use this value to create an embedding of the entire video
    • clip: Use this value to create embeddings for multiple clips, as specified by the video_start_offset_sec, video_end_offset_sec, video_clip_length parameters described below.
      You can include this parameter twice to create embeddings for specific video segments and the entire video in a single request.
  • Embedding settings: The following optional parameters customize the timing and length of the embeddings:
    • video_start_offset_sec: Specifies the start offset in seconds from the beginning of the video where processing should begin.
    • video_end_offset_sec: Specifies the end offset in seconds from the beginning of the video where processing should end.
    • video_clip_length: Specifies the desired duration in seconds for each clip for which the platform generates an embedding.

Note that the platform automatically truncates video segments shorter than 2 seconds. For a 31-second video divided into 6-second segments, the final 1-second segment will be truncated. This truncation only applies to the last segment if it does not meet the minimum length requirement of 2 seconds.

Examples:

  • To split a video into multiple 6-second segments and create an embedding for each:
    Do not provide the video_embedding_scope, video_start_offset_sec, video_end_offset_sec, or video_clip_length parameters.
  • To split the video into multiple 5-second segments and create an embedding for each :
    video_clip_length = 5
    
  • To split a video into multiple 5-second segments from the 30-second mark to the 60-second mark and create an embedding for each:
    video_embedding_scope = clip  
    video_clip_length = 5  
    video_start_offset_sec = 30  
    video_end_offset_sec = 60
    
  • To create a single embedding for the entire video:
    video_embedding_scope = video
    
  • To create a single embedding for a video segment from the 2-second mark to the 12-second mark:
    video_embedding_scope = video
    video_start_offset_sec = 2
    video_end_offset_sec = 12
    
  • To split a video into multiple 6-second segments and create embeddings for each segment as well as the entire video:
    video_embedding_scope = clip
    video_embedding_scope = video
    

Prerequisites

  • You’re familiar with the concepts that are described on the Platform overview page.
  • You have an API key. To retrieve your API key, navigate to the Dashboard page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.
  • The videos for which you wish to generate embeddings must meet the following requirements:
    • Duration: Must be between 4 seconds and 2 hours (7,200s).
    • File size: Must not exceed 2 GB.
    • Video resolution: Must be greater or equal than 360p and less or equal than 4K. For consistent search results, Twelve Labs recommends you upload 360p videos.
    • Video and audio formats: The video files must be encoded in the video and audio formats listed on the FFmpeg Formats Documentation page. For videos in other formats, contact us at [email protected].

Procedure

Follow the steps in the sections below to create a video embedding.

1. Upload and process a video

Invoke the POST method of the /embed/tasks endpoint. The request body must include the video you want to upload, provided as a local file or a publicly accessible URL. This method creates a video embedding task and returns its unique identifier.

From a publicly accessible URL

The following example code uploads a file from a publicly accessible URL. Ensure you replace the placeholders surrounded by <> with your values.

# python -m pip install requests time

import requests

# Construct the URL of the `/embed/tasks` endpoint
BASE_URL = "https://api.twelvelabs.io"
VERSION = "v1.2"
EMBED_TASKS_URL = f"{BASE_URL}/{VERSION}/embed/tasks"

# Set the headers of the request
headers = {
    "x-api-key": "<YOUR_API_KEY>"
}

# Specify the body of the request
data = {
    "engine_name": (None, "Marengo-retrieval-2.6"), # None indicates no file type, just text
    "video_url": (None, "https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_2mb.mp4")
}

# Invoke the POST method of the `/embed/tasks` endpoint
response = requests.post(EMBED_TASKS_URL, headers=headers, files=data) # Use the 'files' parameter to enforce multipart/form-data Content-Type

# Print the status code and the response
print(f"Status Code: {response.status_code}")
print("Response:")
print(response.json())
// npm install axios form-data time

import axios from 'axios'
import FormData from 'form-data';
import fs from 'fs';

// Construct the URL of the `/embed/tasks` endpoint
const BASE_URL = 'https://api.twelvelabs.io';
const VERSION = 'v1.2';
const EMBED_TASKS_URL = `${BASE_URL}/${VERSION}/embed/tasks`;


// Set the headers of the request
const headers = {
    'x-api-key': '<YOUR_API_KEY>',
    'Content-Type': 'multipart/form-data'
};

// Specify the body of the request
const data = new FormData();
data.append('engine_name', 'Marengo-retrieval-2.6');
data.append('video_url', '<YOUR_VIDEO_URL>'); // Example: https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_2mb.mp4

// Invoke the POST method of the `/embed` endpoint
const resp = await axios.post(
  EMBED_TASKS_URL,
  data,
  { headers: headers }
)
let { data: response } = resp;
console.log(`Status code: ${resp.status}`)
console.log(response)

The output should look similar to the following one:

Status Code: 200
Response:
{'_id': '663e05554d11aff765088aea'}

Note that the response contains a field named _id, which represents the unique identifier of your video embedding task. For a description of each field in the request and response, see the API Reference > Create a video embedding task page.

From a the local file system

To upload a file from a the local file system, provide the video_file parameter in the body of the request, as shown below:

# Specify the body of the request
files = {
    "video_file": ("<YOUR_FILE_NAME>", open("<YOUR_FILE_PATH>", "rb"), "video/mp4")
}
data = {
    "engine_name": "Marengo-retrieval-2.6",
}

# Invoke the POST method of the `/embed/tasks` endpoint
response = requests.post(EMBED_TASKS_URL, headers=headers, files=files, data=data)
TASK_ID = response.json().get("_id")
// Specify the body of the request
const data = new FormData();
data.append('engine_name', 'Marengo-retrieval-2.6');
data.append('video_file', fs.createReadStream('<YOUR_FILE_PATH>'), '<YOUR_FILE_NAME>');

// Invoke the POST method of the `/embed` endpoint
const resp = await axios.post(
  EMBED_TASKS_URL,
  data,
  { headers: headers }
)
const TASK_ID = response._id

2. Monitor the status of your video embedding task

Before you can retrieve the video embeddings, you must check the status of the video embedding task to ensure it has completed successfully by performing the following steps:

  1. Retrieve the status of your video embedding task by invoking the GET method of the /embed/tasks/{task-id}/status endpoint.
  2. Check the value of status field:
    • If the status field is ready, the video processing is complete. Proceed to retrieve the embeddings.
    • If the status field is not ready, wait and check again later until the status changes to ready.

The following example code assumes that the unique identifier of your task is stored in a variable named TASK_ID:

# Construct the URL of the `/embed/tasks/{task-id}/status` endpoint
TASKS_STATUS_URL = f"{EMBED_TASKS_URL}/{TASK_ID}/status"

while True:
    response = requests.get(TASKS_STATUS_URL, headers=headers)
    STATUS = response.json().get("status")
    print (f"Status: {STATUS}")
    if STATUS == "ready":
        break
    time.sleep(10)
print("Your video has successfully been processed, and you can now retrieve the embeddings.")
// Construct the URL of the `/embed/tasks/{task-id}/status` endpoint
const TASKS_STATUS_URL = `${EMBED_TASKS_URL}/${TASK_ID}/status`;

while (true) {
    response = await axios.get(TASKS_STATUS_URL, { headers });
    const STATUS = response.data.status;
    console.log(`Status: ${STATUS}`);
    if (STATUS === 'ready') {
        break;
    }
    time.sleep(10)
}
console.log('Your video has successfully been processed, and you can now retrieve the embeddings.');

The output should look similar to the following one:

Status: processing
Status: ready
Your video has successfully been processed, and you can now retrieve the embeddings.

For a description of each field in the request and response, see the API Reference > Retrieve the status of a video embedding task page.

3. Retrieve the embeddings

When the video embedding task status is ready, retrieve the embeddings by invoking the GET method of the /embed/tasks/{task-id}endpoint.

The following example code assumes that the unique identifier of your task is stored in a variable named TASK_ID:

# Construct the URL of the `/embed/tasks/{task-id}` endpoint
RETRIEVE_EMBEDDING_URL = f"{EMBED_TASKS_URL}/{TASK_ID}"

response = requests.get(RETRIEVE_EMBEDDING_URL, headers=headers)

print(f"Status Code: {response.status_code}")
print("Response:")
print(response.json())
// Construct the URL of the `/embed/tasks/{task-id}` endpoint
const RETRIEVE_EMBEDDING_URL = `${EMBED_TASKS_URL}/${TASK_ID}`;

response = await axios.get(RETRIEVE_EMBEDDING_URL, { headers });
console.log(`Status Code: ${response.status}`);
console.log('Response:');
console.log(JSON.stringify(response.data));

Note the following about the response:

  • When you use the default behavior of the platform and no additional parameters are specified, the response should look similar to the following one:

    {
      "_id": "663e16ac4d11aff765088b3a",
      "engine_name": "Marengo-retrieval-2.6",
      "status": "ready",
      "video_embeddings": [
        {
          "start_offset_sec": 0,
          "end_offset_sec": 6,
          "embedding_scope": "clip",
          "embedding": {
            "float": [
              -0.060086973,
              0.016479108,
              ...
            ]
          }
        },
        {
          "start_offset_sec": 6,
          "end_offset_sec": 12,
          "embedding_scope": "clip",
          "embedding": {
            "float": [
              -0.056660935,
              0.012404642,
              ...
            ]
          }
        },
        {
          "start_offset_sec": 12,
          "end_offset_sec": 18,
          "embedding_scope": "clip",
          "embedding": {
            "float": [
              -0.05971131,
              -0.00859428,
              ...
            ]
          }
        }
      ]
    }
    

    In this example response, each object of the video_embeddings array corresponds to a segment and includes the following fields:

    • start_offset_sec: Start time of the segment.
    • end_offset_sec: End time of the segment.
    • embedding_scope: Specifies that the embedding is for a clip.
    • embedding: An array of floats that represents the embedding.
  • When you create a single embedding for the entire video by setting the value of the video_embedding_scope parameter to video, the response should look similar to the following one:

    {
      "_id": "66418f85c70bb578439bd8ee",
      "engine_name": "Marengo-retrieval-2.6",
      "status": "ready",
      "video_embeddings": [
        {
          "start_offset_sec": 0,
          "end_offset_sec": 18,
          "embedding_scope": "video",
          "embedding": {
            "float": [
              -0.05881974,
              0.0067631565,
              ...
            ]
          }
        }
      ]
    }
    

    Note the following about this example response:

    • The video_embeddings array contains a single embedding that corresponds to the entire video
    • The value of the embedding_scope field is set to video. This specifies that the embedding is for the entire video.

    For a description of each field in the request and response, see the API Reference > Retrieve video embeddings page.

  • When you create embeddings for specific video clips and the entire video simultaneously by specifying the video_embedding_scope parameter twice with both the video and clip values, the response should look similar to the following one:

    {
      "_id": "66444e431e13b17a8c2e67ba",
      "engine_name": "Marengo-retrieval-2.6",
      "status": "ready",
      "video_embeddings": [
        {
          "start_offset_sec": 0,
          "end_offset_sec": 6,
          "embedding_scope": "clip",
          "embedding": {
            "float": [
              -0.060086973,
              0.016479108,
              ...
            ]
          }
        },
        {
          "start_offset_sec": 6,
          "end_offset_sec": 12,
          "embedding_scope": "clip",
          "embedding": {
            "float": [
              -0.056660935,
              0.012404642,
              ...
            ]
          }
        },
        {
          "start_offset_sec": 12,
          "end_offset_sec": 18,
          "embedding_scope": "clip",
          "embedding": {
            "float": [
              -0.05971131,
              0.016484642,
              ...
            ]
          }
        },
        {
          "start_offset_sec": 0,
          "end_offset_sec": 18,
          "embedding_scope": "video",
          "embedding": {
            "float": [
              -0.05881974,
              -0.00859428,
              ...
            ]
          }
        }
      ]
    }
    

    Note the following about this example response:

    • The first three embeddings have the embedding_scope field set to clip. Each corresponds to a specific segment of the video you provided.
    • The fourth embedding has the embedding_scope field set to video. This embedding corresponds to the entire video.