API Version 1.2: Beta Release

The 1.2 version of the API is now available and introduces the following major features:

For a complete list of what's new and changed in this version, please refer to the Release notes page.

The Generate API suite is currently in limited beta and accessible exclusively to a select group of users. To request access, please register on this waitlist.

Your participation and feedback during this beta stage will enhance the API for broader release!

Twelve Labs is a video understanding platform that uses artificial intelligence to extract information from videos, such as movement and actions, objects and people, sound, text on screen, and speech. These capabilities are built on top of the platform’s state-of-the-art multimodal foundation model for videos. Twelve Labs helps you add rich, contextual video understanding to your applications by offering developer-friendly APIs.

Key capabilities

Twelve Labs Video Understanding platform equips developers with the following key capabilities:

  • Relevance: Find the exact moment you need within your video library using natural language queries.
  • Intuitive: Integrate a state-of-the-art multimodal foundation model for video understanding into your application in just a few API calls.
  • Speed: Receive your video search or classification results within seconds.
  • Scalability: Our cloud-native distributed infrastructure seamlessly processes thousands of video indexing, video search, and video classification requests.

Experience the key capabilities of the Twelve Labs Video Understanding platform by signing up for a free account or logging in here .


The Twelve Labs advantages

The table below provides a basic comparison between Twelve Labs Video Understanding platform and other cloud-based solutions:

  • Simplified API integration: The platform allows you to perform a rich set of video understanding tasks with just a few API calls, enabling you to concentrate on building your application rather than aggregating data from separate image and speech APIs or managing multiple data sources.
  • Natural language querying: With Twelve Labs, you can compose simple or complex queries using everyday language to pinpoint the precise moments you're looking for within your videos. You can also define new categories to classify video content rather than relying on a limited set of pre-generated tags that are not relevant to your use cases.
  • Fine-tuning capabilities: While our state-of-the-art foundation model for video understanding already provides highly accurate results, we can provide fine-tuning capabilities to help you get more out of the models and achieve better results with only a few examples. For details on fine-tuning the models, please reach out to us at [email protected].