Frequently asked questions
Navigate to the section that best addresses your query. If you don’t find an answer to your question, please contact us .
General questions
This section answers frequently asked general questions.
How does your model handle temporal dimension within videos?
We utilize a technique known as Positional Encoding, which is employed within the Transformers architecture to convey information regarding the position of a sequence of tokens within the input data. In this case, the tokens refer to the key scenes within the video. This technique facilitates the integration of sequential information into our model while simultaneously preserving the parallel processing capability of self-attention within the Transformer architecture.
What are the video hours and video count limits per index?
Video hours measure the total duration of video you index. The limits depend on your plan.
Note the following about the Free plan:
- You have a total of 600 minutes (10 hours) shared across indexing and analysis. You can split them however you choose. This limit is cumulative and does not reset if you delete indexes or videos.
- The example videos in the Playground (approximately 1 hour total) count toward this limit.
For details about each plan, see the Pricing page. To increase your limits, upgrade to the Developer plan.
How long does it take to index a video?
Indexing is typically completed in 30-40% of the duration of the video. However, indexing duration also depends on the number of concurrent indexing tasks, and delays can occur if too many indexing tasks are being processed simultaneously. If you’re on the Free plan, for faster indexing, consider upgrading to the Developer plan, which supports more concurrent tasks. We also offer a dedicated cloud deployment option for enterprise customers. Please contact us at sales@twelvelabs.io to discuss this option.
Can your model recognize natural sounds in videos?
Yes, the model analyzes visual and audio information and learns the correlation between certain visual objects or situations with sounds frequently appearing together.
Can your models recognize text from other languages?
Yes, the models support multiple languages. See the Supported languages page for details.
How does your visual language model compare to other LLMs?
The platform utilizes a multimodal approach for video understanding. Instead of relying on textual input like traditional LLMs, the platform interprets visuals, sounds, and spoken words to deliver comprehensive and accurate results.
Can I use TwelveLabs with my own LLM or with LangChain?
You can optionally integrate our video-to-text model (Pegasus) with your LLMs.
How can I change my login method?
To change your login method (for example, from username/password to SSO or vice versa), contact our support team at support@twelvelabs.io to delete your current account, then create a new one with your preferred login method.
Does my invoice include a detailed cost breakdown?
If you’re on the Developer plan, TwelveLabs provides invoices that include a detailed cost breakdown. You can view your invoice using one of the following methods:
- Email: Open your invoice sent via email, and select the View invoice and payment details button.
- Playground: Go to the Billing & plan page, log in to your account, scroll to the Billing History section, and select the PDF for your invoice.
If you’re on the Enterprise plan, TwelveLabs provides invoices without detailed cost breakdowns.
Embed API
This section answers frequently asked questions related to the Embed API.
When should I use the Embed API versus the built-in search?
The Embed API and built-in search service offer different functionalities for working with visual content.
Embed API
- Generate visual embeddings for:
- RAG workflows
- Hybrid search
- Classification
- Clustering
- Use the embeddings as input for your custom models
- Create flexible, domain-specific solutions
Built-in search service
- Perform semantic searches across multiple modalities:
- Visual content
- Conversation (human speech)
- Text-in-video (OCR)
- Logo
- Utilize production-ready, out-of-the-box functionality
- Ideal for projects not requiring additional customization
Analyze API
This section answers frequently asked questions related to the Analyze API.
What LLM does the Analyze API use?
The Analyze API employs our foundational Visual Language Model (VLM), which integrates a language encoder to extract multimodal data from videos and a decoder to generate concise text representations.
To use the Analyze API, do I need to reindex my videos if I already indexed them with Marengo?
Yes, you must reindex videos using the Pegasus engine. See the Analyze videos and Pricing pages for details.
How is video segmentation priced?
Pricing depends on your plan.
Free plan
The video duration counts toward your plan quota. The number of segment definitions does not affect this quota.
Developer plan
You pay based on how much video you process and how many segment definitions you include. If you provide the start_time and end_time parameters, you pay for that time range only. Otherwise, you pay for the full video duration. Each segment definition multiplies the cost.
Examples:
Note
Time ranges within individual segment definitions control which portions of the video are analyzed. The billable duration is always the full start_time–end_time span.
For current rates, see the Pricing page.