Analyze videos | TwelveLabs

The Analyze API suite uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.

Key features:

Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.

Use cases:

Content structuring: Organize and structure content for e-learning platforms to improve usability.
SEO optimization: Optimize content to rank higher in search engine results.
Highlight creation: Create short, engaging video clips for media and broadcasting.
Incident reporting: Record and report incidents for security and law enforcement purposes.

To understand how your usage is measured and billed, see the Pricing page.

Depending on your use case, follow the steps in one of the guides below:

Titles, topics, and hashtags

Description: Generates quick text summaries of video content using predefined formats.
Customization: Uses fixed formats with no prompts.
Best use: Ideal for fast, simple text outputs without customization.

Summaries, chapters, and highlights

Description: Creates summarized text, chapters, or highlights based on predefined formats, with optional custom prompts.
Customization: Supports predefined formats and optional prompts for tailored outputs.
Best use: Balances efficiency of predefined formats with some customization.

Open-ended analysis

Description: Analyzes videos and produces fully customizable text based on your prompts.
Customization: Requires clear user prompts for maximum flexibility.
Best use: Suits advanced users needing specific outputs beyond predefined formats.

Notes

Your prompts can be instructive or descriptive, or you can also phrase them as questions.
The platform generates text according to the model options enabled for your index, which determine the types of information the video understanding model processes.

Example:
- If both the visual and audio model options are enabled, the platform generates text based on both visual and audio information.
- If only the visual option is enabled, the platform generates text based only on visual information.
- The maximum length of a prompt is 2,000 tokens.