Twelve Labs Video Understanding Platform uses artificial intelligence to extract information from videos. The platform identifies and interprets movements, actions, objects, individuals, sounds, on-screen text, and spoken words. Built on top of our state-of-the-art multimodal foundation model optimized for videos, the platform enables you to add rich, contextual video understanding to your applications through developer-friendly APIs.

Key capabilities of Twelve Labs for multimodal video understanding

Twelve Labs Video Understanding Platform equips developers with the following key capabilities:

  • Deep semantic search: Find the exact moment you need within your videos using natural language queries instead of tags or metadata.
  • Zero-shot classification: Use natural language to create your custom taxonomy, facilitating accurate and efficient video classification tailored to your unique use case.
  • Dynamic video-to-text generation: Capture the essence of your videos into concise summaries or custom reports. The platform offers built-in formats to generate the following: titles, topics, summaries, hashtags, chapters, and highlights. Additionally, you can provide a prompt detailing the content and desired output format, such as a police report, to tailor the results to your needs.
  • Intuitive integration: Embed a state-of-the-art multimodal foundation model for video understanding into your application in just a few API calls.
  • Rapid result retrieval: Receive your results within seconds.
  • Scalability: Our cloud-native distributed infrastructure seamlessly processes thousands of concurrent requests.

Twelve Labs’ Advantages

The table below provides a basic comparison between Twelve Labs Video Understanding Platform and other video AI solutions:

  • Simplified API integration: Perform a rich set of video understanding tasks with just a few API calls. This allows you to focus on building your application rather than aggregating data from separate image and speech APIs or managing multiple data sources.
  • Natural language use: Tap into the model's capabilities using everyday language to write queries or prompts. This method is more effective, intuitive, flexible, and accurate than using solely rules, tags, or keywords.
  • Multimodal approach: The platform adopts a video-first, multimodal approach, surpassing traditional unimodal models that depend exclusively on text or images, providing a comprehensive understanding of your videos.
  • One-time video indexing for multiple tasks: Index your videos once and create contextual video embeddings that encapsulate semantics for scaling and repurposing, allowing you to search and classify your videos swiftly.
  • Flexible deployment: The platform can adapt to varied business needs, with deployment options spanning on-premise, hybrid, or cloud-based environments.
  • Fine-tuning capabilities: Though our state-of-the-art foundation model for video understanding already yields highly accurate results, we can provide fine-tuning capabilities to help you get more out of the models and achieve better results with only a few examples.

For details on fine-tuning the models or different deployment options, please contact us at [email protected].

Discover Twelve Labs

Experience the key capabilities of the Twelve Labs Video Understanding platform by signing up for a free account or logging in to the Playground .