Migration guide
This guide shows how to migrate your applications to the 1.3 version of the API, which introduces significant improvements to the video understanding capabilities of the platform, simplified modalities, and a streamlined endpoint structure.
Important
You must use SDK version 0.4.x or later to access API version 1.3. Earlier SDK versions (0.3.x and below) only support the deprecated API version 1.2.
What’s new in v1.3?
- Marengo 2.7: A new version of the Marengo video understanding model has been released. The 1.3 version of the API version only supports Marengo 2.7. This new version improves accuracy and performance in the following areas:
- Multimodal processing that combines visual, audio, and text elements.
- Fine-grained image-to-video search: detect brand logos, text, and small objects (as small as 10% of the video frame).
- Improvement in motion search capability.
- Counting capabilities.
- More nuanced audio comprehension: music, lyrics, sound, and silence.
- Simplified modalities:
visual
: includes objects, actions, text OCR, logos.audio
: includes speech, music, and ambient sounds.conversation
has been deprecated.text_in_video
andlogo
are now part ofvisual
.
- Streamlined endpoint structure: Several endpoints and parameters have been deprecated, removed, or renamed.
Note
This guide presents the changes to the API. Since the SDKs reflect the structure of the API, review the Migration examples section below and the relevant SDK reference sections to understand how these changes have been implemented:
Breaking changes
This section presents the changes that require updates to your code and includes the following subsections:
- Global changes that affect multiple endpoints
- Changes organized by endpoint and functionality (example: upload videos, manage indexes, etc.)
In the sections below, see the Required Action column for each change, then use the corresponding example in the Migration examples section to update your code.
Global changes
Deprecated endpoints
Upload videos
Manage indexes
Manage videos
Search
The Generate API has been renamed to the Analyze API
The Generate API has been renamed to the Analyze API to more accurately reflect its purpose of analyzing videos to generate text. This update includes changes to specific API endpoints and SDK methods, outlined below. You can continue using the Generate API until July 30, 2025. After this date, the Generate API will be deprecated, and you must transition to the Analyze API.
API endpoint changes:
- The
/generate
endpoint is now the/analyze
endpoint. - The
/gist
endpoint remains unchanged. - The
/summarize
endpoint remains unchanged.
SDK method changes:
The generate
prefix has been removed from method names, and the methods below have been renamed as follows:
generate.gist
is nowgist
generate.summarize
is nowsummarize
generate.text
is nowanalyze
generate.text_stream
is nowanalyze_stream
(Python)generate.textStream
is nowanalyzeStream
(Node.js)
Parameter changes:
To maintain compatibility, update your API calls and SDK methods to the new names before July 30, 2025. For additional details, refer to the following resources:
Non-breaking changes
These changes add new functionality while maintaining backward compatibility.
Upload videos
Migration steps
Migrating to v1.3 involves two main steps:
- Update your integration
- Update your code. Refer to the Migration Examples setion for details.
1. Update your integration
Choose the appropriate method based on how you interact with the TwelveLabs API:
- Official SDKs: Install version 0.4.x or later.
- HTTP client: Update your base URL.
2. Migration examples
Below are examples showing how to update your code for key breaking changes. Choose the examples matching your integration type.
Create indexes
Creating an index in version 1.3 includes the following key changes:
- Renamed parameters: The parameters that previously began with
engine*
have now been renamed tomodel*
. - Simplified modalities: The previous modalities of [
visual
,conversation
,text_in_video
,logo
] have been simplified to [visual
,audio
]. - Marengo version update: Use “marengo2.7” instead of “marengo2.6”.
Perform a search request
Performing a search request includes the following key changes:
- Simplified modalities: The previous modalities of [
visual
,conversation
,text_in_video
,logo
] have been simplified to [visual
,audio
]. - Deprecated parameter: The
conversation_option
parameter has been deprecated. - Streamlined response: The
metadata
andmodules
fields in the response have been deprecated.
Create embeddings
Creating embeddings includes the following key changes:
- Marengo version update: Use “Marengo-retrieval-2.7” instead of “Marengo-retrieval-2.6”.
- Renamed parameter: The parameters that previously began with
engine*
have now been renamed tomodel*
.
The following example creates a text embedding, but the principles demonstrated are similar for image, audio, and video embeddings:
Use Pegasus to classify videos
The Pegasus video understanding model analyzes video content and generates descriptive text to enable flexible video classification. You can use established category systems like YouTube video categories or IAB Tech Lab Content Taxonomy . You can also define custom categories for your specific needs.
The example below classifies a video based on YouTube’s video categories:
Detect logos
You can search for logos using text or image queries:
- Text queries: For logos that include text (example: Nike)
- Image queries: For logos without text (example: Apple’s apple symbol).
The following example searches for the Nike logo using a text query:
The following example searches for the Apple logo using an image query:
Search for text shown in videos
To search for text in videos, use text queries that target either on-screen text or spoken words in transcriptions rather than objects or concepts. The platform searches across both:
- Text shown on screen (such as titles, captions, or signs)
- Spoken words from audio transcriptions
Note that the platform may return both textual and visual matches. For example, searching for the word “smartphone” might return:
- Segments where “smartphone” appears as on-screen text.
- Segments where “smartphone” is spoken.
- Segments where smartphones are visible as objects.
The example below finds all the segments where the word “innovation” appears as on-screen text or as a spoken word in transcriptions: