Video understanding engines

A video understanding engine consists of a family of deep neural networks built on top of our multimodal foundation model for video understanding. The platform uses an engine to process your videos and create video embeddings, so that your video data becomes available for downstream tasks such as search or classification.

While using the API, you interact with an engine in the following ways:

  • Creating indexes: Each engine uses its own family of deep learning models to index videos. When you create an index, you assign it to an engine and specify how it'll process your videos by passing the index_options parameter in the body of the request. These settings apply to all the videos you upload to your index and cannot be changed.

  • Performing searches: When you perform a search, you pass at least the following parameters:

    • Your search query. Note that the API supports full natural language-based search. The following examples are valid queries: "birds flying near a castle", "sun shining on water", "chickens on the road", "an officer holding a child's hand.", "crowd cheering in the stadium."
    • The unique identifier of the index that you want to search
    • The source of information the engine uses when performing a search. For details, see the Search options page.

    The engine uses these parameters to find the moments in your videos that match your requirements and returns an array of objects. Depending on whether you're using simple or combined queries, the fields in the request and the response are described on the API Reference > Search or API Reference > Combined queries page.

  • Classifying videos: When classifying videos, you pass at least the following parameters:

    • An array of objects representing the classes and prompts based on which the platform must classify your videos.
    • The source of information the engine uses when classifying your videos. For details, see the Search options page.

    Depending on whether you're classifying a single video or all the videos within an index, the fields in the request and the response are described on the API Reference > Classify a video or API Reference > Classify all the videos within an index page.

To handle different use cases and to improve the performance of the platform, Twelve Labs has developed the engines described in the sections below.

Marengo 2.5

The latest and best-performing video understanding engine by Twelve Labs.

πŸ“˜

Note

Twelve Labs strongly recommends you use marengo2.5.

Marengo2

This version introduced significant performance improvements.

Marengo

Marengo is the engine that was available when the platform launched. It allows you to find the exact moments in your videos by writing semantic queries in everyday language.