For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sample appsIntegrationsDiscordPlaygroundDevEx repo
GuidesSDK ReferenceAPI Reference
GuidesSDK ReferenceAPI Reference
  • Get Started
    • Introduction
    • Quickstart
    • Manage your plan
    • Rate limits
    • Release notes
    • Migration guide
  • Guides
    • Search
    • Analyze videos
    • Segment videos
    • Create embeddings
      • Video embeddings
      • Audio embeddings
      • Text embeddings
      • Single image embeddings
      • Text and image embeddings
  • Concepts
    • Models
    • Upload and processing methods
    • Indexes
    • Modalities
    • Multimodal large language models
  • Cloud partner integrations
    • Amazon Bedrock
  • Advanced
    • Organizations
    • Fine-tuning
    • Webhooks
    • Metadata
    • Model context protocol
    • Claude Code Plugin
  • Resources
    • Platform overview
    • Playground
    • TwelveLabs SDKs
    • Frequently asked questions
    • Use cases
    • Sample applications
    • Partner integrations
    • From the community
LogoLogo
Sample appsIntegrationsDiscordPlaygroundDevEx repo
Guides

Create embeddings

Was this page helpful?
Previous

Video embeddings

Next
Built with

Use the platform to create multimodal embeddings for videos, texts, images, and audio files. These embeddings are contextual vector representations that capture interactions between modalities, such as visual expressions, body language, spoken words, and video context. You can apply these embeddings to downstream tasks like training custom multimodal models for anomaly detection, diversity sorting, sentiment analysis, recommendations, or building Retrieval-Augmented Generation (RAG) systems.

Key features:

  • Native multimodal support: Process all modalities natively without separate models or frame conversion.
  • State-of-the-art performance: Captures motion and temporal information for accurate video interpretation.
  • Unified vector space: Combines embeddings from different modalities for holistic understanding.
  • Fast and reliable: Reduces processing time for large video sets.
  • Flexible segmentation: Generate embeddings for video segments or the entire video.

Use cases:

  • Anomaly detection: Identify unusual patterns, such as corrupt videos with black backgrounds, to improve data set quality.
  • Diversity sorting: Organize data for broad representation, reducing bias and improving AI model training.
  • Sentiment analysis: Combine vocal tone, facial expressions, and spoken language for accurate insights, which is particularly useful for customer service.
  • Recommendations: Use embeddings in similarity-based retrieval and ranking systems for recommendations.
Retention policy

Embeddings created through the async endpoints (/embed-v2/tasks) are stored for seven days. After this period, you must recreate the embedding task to obtain the results again.

For details on how your usage is measured and billed, see the Pricing page.

Create video embeddings
Create text embeddings
Create audio embeddings
Create single image embeddings
Create text and image embeddings