Platform overview
Twelve Labs Video Understanding Platform, currently in beta, provides an API suite for integrating a state-of-the-art (“SOTA”) foundation model that understands contextual information from your videos, making it accessible to your applications. The API is organized around REST and is compatible with most programming languages. You can also use Postman or other REST clients to send requests and view responses.
Architecture overview
The following diagram illustrates the architecture of the Twelve Labs Video Understanding Platform and how different parts interact:
Indexes
An index is a basic unit for organizing and storing video data consisting of video embeddings and metadata. Indexes facilitate information retrieval and processing.
Video understanding models
A video understanding model consists of a family of deep neural networks built on top of our multimodal foundation model for video understanding, offering search and summarization capabilities. For each index, you must configure the models you want to enable. See the Video understanding models page for more details about the available models and their capabilities.
Model options
The model options define the types of information that a specific model will process. Currently, the platform provides the following model options: visual and audio. For more details, see the Model options page.
Query/Prompt Processing Engine
This component processes the following user inputs and returns the corresponding results to your application:
- Search queries
- Prompts for generating text from video
Updated about 1 month ago