Create embeddings

📘

Note

The 2.7 version of the Marengo video understanding model generates embeddings incompatible with v2.6, which will be discontinued. If you are using v2.6 embeddings, regenerate them using v2.7.

Use the Embed API to create multimodal embeddings that are contextual vector representations for your videos, texts, images, and audio files. Twelve Labs video embeddings capture all the subtle cues and interactions between different modalities, including the visual expressions, body language, spoken words, and the overall context of the video, encapsulating the essence of all these modalities and their interrelations over time.

You can utilize multimodal embeddings in various downstream tasks, including but not limited to training custom multimodal models for applications such as anomaly detection, diversity sorting, sentiment analysis, and recommendations. You can also use multimodal embeddings to construct Retrieval-Augmented Generation (RAG) systems.

The Embed API provides the following benefits:

  • Flexibility for any modality: The API supports native processing of all modalities present in videos, eliminating the need for text-only or image-only models or converting videos into frames for image-based models.
  • State-of-the-art performance: Unlike traditional approaches that use CLIP-like models, which do not account for motion, action, or temporal information in videos, Twelve Labs' video-native approach ensures a more accurate and temporally coherent interpretation of your video content.
  • Unified vector space: You can integrate embeddings specific to each modality into a single, unified vector space, facilitating a more holistic understanding across all modalities. This approach surpasses traditional methods, offering a video-native understanding similar to human perception.
  • Fast and reliable: With native support for video processing, the API significantly reduces the time required for processing. This is particularly beneficial if you have a large set of videos requiring high throughput and low latency.
  • Flexible video segmentation: The API allows you to create multiple embeddings from different segments of a video or a single embedding for the entire video.

📘

Note

The platform can generate embeddings for text, audio, and image content types individually or in any combination within a single API call.

Use cases

Below are several notable use cases:

  • Anomaly detection: You can use the platform to identify unusual patterns or anomalies in diverse data types. For example, you can detect and remove corrupt videos that only display a black background, thereby enhancing the quality of data set curation.
  • Diversity sorting: The platform helps you organize data to ensure a broad representation across various features, characteristics, or modalities. For example, in AI model training, especially with multimodal data, maintaining a diverse training set is crucial to minimize bias and enhance model generalization.
  • Sentiment analysis: By integrating vocal tone, facial expressions, and spoken language from the video content, the platform provides more accurate insights than traditional text-only methods. This is particularly useful in customer service to effectively gauge client satisfaction.
  • Recommendations: Use embeddings in embedding-similarity scores-based retrieval and ranking recommendation systems.