Release notes

The sections below list all new features, enhancements, and changes to the platform, in chronological order.

Version 1.3

January 13, 2025

Twelve Labs announces the public preview release of Pegasus 1.2 Small, our latest video understanding model.

Key improvements:
The new model offers significant improvements over Pegasus 1.1:

  • Extended video processing capacity from 30 minutes to 1 hour per video.
  • Enhanced performance across video-language tasks compared to Pegasus 1.1 and other models of the same size (Small ~ 8B).
  • More granular visual comprehension of objects, on-screen text, and numerical content.
  • More accurate temporal grounding and timestamp identification. For example, you can ask questions about the timestamps of certain events.

During the preview phase, the model is available only for new and sample indexes. Existing Pegasus 1.1 indexes remain fully supported. All current indexes will be automatically migrated to Pegasus 1.2 at no cost during the official release (date to be announced).

Note that the model may produce occasional errors or hallucinations. For support or feedback, contact [email protected].

December 2, 2024

Twelve Labs is proud to introduce the following new features and improvements:

  • Marengo 2.7: This new version of the Marengo video understanding engine improves accuracy and performance in the following areas:
    • Multimodal processing that combines visual, audio, and text elements.
    • Fine-grained image-to-video search: detect brand logos, text, and small objects (as small as 10% of the video frame).
    • Improvement in motion search capability.
    • Counting capabilities.
    • More nuanced audio comprehension: music, lyrics, sound, and silence.
      For more details on the new features and improvements in this version, refer to this blog post: Introducing Marengo 2.7: Pioneering Multi-Vector Embeddings for Advanced Video Understanding.
  • Simplified modalities:
    • visual: includes objects, actions, text OCR, logos.
    • audio: includes speech, music, and ambient sounds.
    • conversation has been deprecated.
    • text_in_video and logo are now part of visual.
  • Streamlined endpoint structure: Several endpoints and parameters have been deprecated, removed, or renamed.

📘

Notes

  • The 1.3 version of the API version only supports Marengo 2.7.
  • Marengo 2.7 generates embeddings that are not backward compatible. You must reindex all your videos and regenerate all your embeddings with Marengo 2.7.
  • The audio search feature in Marengo 2.7 works best with full-sentence queries. Short queries may yield suboptimal results. This limitation is temporary and will be addressed in a future release.

See the Migration guide page for a detailed list of the changes and instructions on how you can update your code.

Version 1.2

If you have used the 1.1.2 version of the API, please refer to the following section for important information regarding the changes.

November 12, 2024

Improvements

  • Cloud-to-cloud Integrations API: The API has been updated to provide a more intuitive experience. The /tasks/transfers endpoint will be deprecated. Use the following endpoints instead:

November 5, 2024

Improvements

  • Embed API: The structure of the responses has been streamlined across all endpoints to provide a more consistent and intuitive experience:
    • Standardized object naming:
      • The video_embeddings field has been renamed to video_embedding.
      • The video_embedding object now encapsulates the embeddings, related metadata, and additional information.
    • Enhanced response structure:
      • The embedding vectors are now nested under an array named segments.
      • The metadata objects have been moved under their respective parent embedding objects.
      • The is_success boolean has been removed.
    • Affected endpoints:
  • Embed API: You can now retrieve vector embeddings for any indexed video by setting embed=true in your GET /indexes/{index-id}/videos/{video-id} requests.

October 24, 2024

New features

  • Embed API:
    • You can create image and audio embeddings in addition to its existing video and text capabilities. See the Create embeddings page for details.
    • You can now retrieve a list of the video embedding tasks in your account by invoking the GET method of the /embed/tasks endpoint.

July 7, 2024

New features

  • Pegasus 1.1 : The 1.1 version of the Pegasus video understanding engine has been released, introducing the following enhancements:

    • Improved model accuracy for video description and question-answering.
    • Fine-grained visual understanding and instruction following.
    • Streaming support when generating open-ended texts. For details, refer to the Streaming responses page.
    • Increased maximum prompt length to 1500 characters.
    • Extended maximum video duration to 30 minutes.

    NOTE: Effective July 8, 2024, Pegasus 1.0 is no longer supported. All existing indexes created with Pegasus 1.0 will be automatically upgraded to Pegasus 1.1. No manual intervention is required for this migration process, and all indexes will utilize Pegasus 1.1 upon completion.

June 18, 2024

New features

  • Image-to-Video Search API: Twelve Labs is proud to introduce the Image-to-Video Search API. This new API allows you to find semantically related video segments by providing an image as a query. The platform identifies similar content within videos. To get started with the Image-to-Video Search API, refer to Image queriespage.

May 15, 2024

New features

  • Embed API: Twelve Labs is proud to introduce the Embed API. You can use this new API to create multimodal embeddings that are contextual vector representations for your videos and texts. You can utilize multimodal embeddings in various downstream tasks, including but not limited to training custom multimodal models for applications such as clustering, classification, search, recommendation, and anomaly detection. See the Create embeddings page for details.

March 12, 2024

New features

  • Twelve Labs is proud to introduce the new versions of its video understanding models:
    • Marengo 2.6: A new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more. Note that the platform currently supports text-to-video search and classification features. Other modalities will be supported in a future release. This model represents a significant leap in video understanding technology, enabling more intuitive and comprehensive search capabilities across various media types. For an overview of the new features and improvements in this version, refer to this blog post: Introducing Marengo 2.6: A New State-of-the-Art Video Foundation Model for Any-to-Any Search.
    • Pegasus 1.0 beta: This version of the model provides fine-grained video descriptions, summaries, and question-answering capabilities. For an overview of the new features and improvements in this version, refer to this blog post: Pegasus-1 Open Beta: Setting New Standards in Video-Language Modeling.
  • The platform now supports search queries in multiple languages. For a complete list of supported languages, refer to the Supported languages page.

Updates

  • You can now enable the Pegasus and Marengo video understanding engines on the same index.

February 15, 2024

New features

October 30, 2023

New features

Version 1.2 of the Twelve Labs Video Understanding Platform introduces the following new features:

  • The alpha version of the Pegasus video understanding engine has been released. You can now use it to generate text from video.
  • You can now upload videos from external providers. Currently, only YouTube is supported as an external provider, but we will add support for additional providers in the future. See the Upload from external providers page for details.

Updates

This section lists the differences between version 1.1.2 and version 1.2 of the Twelve Labs Video Understanding API.

  • When you make an API call, make sure that you specify the 1.2 version in the URL.
    The URL should look similar to the following one: https://api.twelvelabs.io./v1.2/{resource}/{path_parameters}?{query_parameters}. For more details, see the Call an endpoint section.
  • To enable the utilization of multiple engines for an index, the following changes have been made:
    • POST /indexes: The engine_id and indexing_options parameters of the request have been deprecated. Instead, you can now define the engine configuration as a list of objects. See the Create an index page for details.
    • GET /indexes/{index_id}: The engine_id field in the response has been superseded by an array of objects named engines. See the Retrieve an index page for details.
    • GET /indexes:
      • The engine_id field in the response has been superseded by an array of objects named engines. See the List indexes page for details.
      • The engine_family query parameter has been introduced, allowing you to filter by engine family.
      • The index_options query parameter has been marked for deprecation. You can still use it in this version of the API, but it will be deprecated in a future release. Instead, use engine_options or engine_family.
    • GET/engines: The allowed_index_option field in the response has been renamed to allowed_engine_options. See the List engines page for details.
    • GET/engines/{engine-id} The allowed_index_option field in the response has been renamed to allowed_engine_options. See the Retrieve an engine page for details.
  • The /search and /search/{page-token} endpoints no longer return the conversation_option, search_options, and query fields.

Version 1.1.2

If you have used the 1.1.1 version of the API, please refer to the following section for important information regarding the changes.

Improvements

To further improve the usability of the /classify endpoint, the following changes have been made:

  • The endpoint now allows you to classify a set of videos. The video_id parameter has been deprecated and now you must pass an array of strings named video_ids instead. Each element of the array represents the unique identifier of a video you want to classify. For details, see the Classify a set of videos page.

  • The threshold field in the request is now an object, and you can use it to filter based on the following criteria:

    • The confidence level that a video matches the specified class
    • The confidence level that a clip matches the specified class.
    • The duration ratio, which is the sum of the lengths of the matching video clips inside a video divided by the total length of the video.

    For details, see the Filtering > Content classification page.

  • The endpoint now supports pagination.

  • The duration-weighted score has been deprecated. When setting the show_detailed_score parameter to true, the platform now returns the maximum, average, and normalized scores.

Version 1.1.1

If you have used the 1.1 version of the API, please refer to the following sections for important information regarding the changes.

New features

Version 1.1.1 of the Twelve Labs Video Understanding Platform introduces the following new features:

Improvements

To further improve flexibility, usability, and clarity, the following changes have been made:

  • Combined queries:
    • You can now define global values for the search_options and conversation_option parameters for the entire request instead of per-query basis. For details, see the Use combined queries page.
    • The /beta/search endpoint has been renamed to /combined-search.
  • Logo detection: The logo add-on has been deprecated. To enable logo detection for an index, you must now use the logo indexing option.
  • Conversation option: The transcription conversation option has been renamed to exact_match.
  • Classifying videos:
    • The labels parameter has been renamed to classes.
    • The threshold field you can use to narrow down a response obtained from the platform is now of type int. For details, see the API Reference > Classify a video page.

Version 1.1

The introduction of new features and improvements in the 1.1 version of the Twelve Labs Video Understanding Platform has required changes to some endpoints. If you have used the 1.0 version of the API, please refer to the following sections for important information regarding the changes.

New features

Version 1.1 of the Twelve Labs Video Understanding Platform introduces the following new features:

  • Classification of content: You can now define a list of labels that you want to classify your videos into, and the new classification API endpoint will return the duration for which the specified labels have appeared in your videos and the confidence that each of the matches represents the label you've specified. For more details, see the Classify page.
  • Combined Queries: The 1.1 version of the API introduces a new format of search queries named combined queries. A combined query includes any number of subqueries linked with any number of logical operators. Combined queries are executed in one API request.
    Combined queries support the following additional features:
    • Negating a condition: In addition to the existing AND operator, the platform now allows you to use the NOT operator to negate a condition. For example, this allows you to write a query that retrieves all the video clips in which someone is cooking but neither spaghetti nor lasagna is mentioned in the conversation.
    • The THEN operator: The platform now supports the THEN operator that allows you to specify that the platform must return only the results for which the order of the matching video clips is the same as the order of your queries.
    • Time-proximity search: The Twelve Labs Video Understanding API now allows you to use the proximity parameter to extend the lower and upper boundaries of each subquery. For example, this allows you to write a query that finds all car accidents that happened within a specific interval of time before someone wins a race.
      For details, see the Use combined queries page.
  • Logo detection: The platform can now detect brand logos.

Updates

This section lists the differences between version 1 and version 1.1 of the Twelve Labs Video Understanding API.

  • When you make an API call, make sure that you specify the 1.1 version in the URL.
    The URL should look similar to the following one: https://api.twelvelabs.io./v1.1/{resource}/{path_parameters}?{query_parameters}. For more details, see the Call an endpoint section.

  • The following methods now return a 200 OK status code when the response is empty:

    • [GET] /indexes
    • [GET] /tasks
    • [GET] /indexes/{index_id}/videos
  • The /tasks endpoint is now a separate endpoint and is no longer part of the /indexes endpoint. The table below shows the changes made to each method of the /tasks endpoint:

    1.01.1
    GET /indexes/tasksGET /tasks
    POST /indexes/tasksPOST /tasks
    GET /indexes/tasks/{task_id}GET /tasks/{task_id}
    DELETE /indexes/tasks/{task_id}DELETE /tasks/{task_id}
    POST /indexes/tasks/transfersPOST /tasks/transfers
    GET /indexes/tasks/statusGET /tasks/status
  • The /indexes/tasks/{task_id}/video_id endpoint has been deprecated. You can now retrieve the unique identifier of a video by invoking the GET method of the /tasks/{task_id} endpoint. The response will contain a field named video_id. For details, see steps six and seven on the Upload from the local file system page.

  • When an error occurs, the platform now follows the recommendations of the RFC 9110 standard. Instead of numeric codes, the platform now returns string values containing human-readable descriptions of the errors. The format of the error messages is as follows:

    • code: A string representing the error code.
    • message: A human-readable string describing the error, intended to be suitable for display in a user interface.
    • (Optional) docs_url: The URL of the relevant documentation page.
      For example, if you tried to list all the videos in an index and the unique identifier of the index you specified didn't exist, the 1.0 version of the API returned an error similar to the following one:
    {
      "error_code": 201,
      "message": "ID 234234 does not exist"
    }
    

    Now, when using the 1.1 version of the API, the error should look similar to the following one:

    {
      "code": "parameter_not_provided",
      "message": "The index_id parameter is required but was not provided."
    }
    

    For a list of error messages, see the API Reference > Error codes page.

  • The next_page_id and prev_page_id fields of the page_info object have been renamed to next_page_token and prev_page_id.

  • The type field has been removed from all the responses.

  • When performing searches specifying multiple search options, the platform returns an object containing the confidence level that a specific video clip matched your search terms for each type of search. In version v1.0, this field was a dictionary named module_confidence. In version v1.1, this field is now named module and is of type array. For details, see the Using multiple sources of information section.

  • The POST method of the /search/{page-token} endpoint has been deprecated. To retrieve the subsequent pages, you must now call the GET method of the /search/{page-token} endpoint, passing it the unique identifier of the page you want to retrieve. For details, see the Pagination > Search results page.