Navigate to the section that best addresses your query. If you don’t find an answer to your question, please contact us .

General questions

This section answers frequently asked general questions.

How does your model handle temporal dimension within videos?

We utilize a technique known as Positional Encoding, which is employed within the Transformers architecture to convey information regarding the position of a sequence of tokens within the input data. In this case, the tokens refer to the key scenes within the video. This technique facilitates the integration of sequential information into our model while simultaneously preserving the parallel processing capability of self-attention within the Transformer architecture.

What is the maximum size of videos that can be stored in one index?

The Developer plan can accommodate up to 10,000 hours of video (whether in a single index or a combination of all indexes). For larger volumes, our enterprise plan would be best suited. Please contact us for more information at sales@twelvelabs.io.

How long does it take to index a video?

Indexing is typically completed in 30-40% of the duration of the video. However, indexing duration also depends on the number of concurrent indexing tasks, and delays can occur if too many indexing tasks are being processed simultaneously. If you’re on the Free plan, for faster indexing, consider upgrading to the Developer plan, which supports more concurrent tasks. We also offer a dedicated cloud deployment option for enterprise customers. Please contact us at sales@twelvelabs.io to discuss this option.

Can your model recognize natural sounds in videos?

Yes, the model analyzes visual and audio information and learns the correlation between certain visual objects or situations with sounds frequently appearing together.

Can your models recognize text from other languages?

Yes, the models support multiple languages. See the Supported languages page for details.

How does your visual language model compare to other LLMs?

The platform utilizes a multimodal approach for video understanding. Instead of relying on textual input like traditional LLMs, the platform interprets visuals, sounds, and spoken words to deliver comprehensive and accurate results.

Can I use TwelveLabs with my own LLM or with LangChain?

You can optionally integrate our video-to-text model (Pegasus) with your LLMs. We also provide an open-source project demonstrating the integration with LangChain. Find out more at twelvelabs-io/tl-jockey.

To change your login method (for example, from username/password to SSO or vice versa), contact our support team at support@twelvelabs.io to delete your current account, then create a new one with your preferred login method.

Does my invoice include a detailed cost breakdown?

If you’re on the Developer plan, TwelveLabs provides invoices that include a detailed cost breakdown. You can view your invoice using one of the following methods:

Email: Open your invoice sent via email, and select the View invoice and payment details button.
Playground: Go to the Billing & plan page, log in to your account, scroll to the Billing History section, and select the PDF for your invoice.

If you’re on the Enterprise plan, TwelveLabs provides invoices without detailed cost breakdowns.

Embed API

This section answers frequently asked questions related to the Embed API.

When should I use the Embed API versus the built-in search?

The Embed API and built-in search service offer different functionalities for working with visual content.

Embed API

Generate visual embeddings for:
- RAG workflows
- Hybrid search
- Classification
- Clustering
Use the embeddings as input for your custom models
Create flexible, domain-specific solutions

Built-in search service

Perform semantic searches across multiple modalities:
- Visual content
- Conversation (human speech)
- Text-in-video (OCR)
- Logo
Utilize production-ready, out-of-the-box functionality
Ideal for projects not requiring additional customization

Analyze API

This section answers frequently asked questions related to the Analyze API.

What LLM does the Analyze API use?

The Analyze API employs our foundational Visual Language Model (VLM), which integrates a language encoder to extract multimodal data from videos and a decoder to generate concise text representations.

To use the Analyze API, do I need to reindex my videos if I already indexed them with Marengo?

Yes, you must reindex videos using the Pegasus engine. See the Analyze videos and Pricing pages for details.