Analyze videos
The Analyze API suite uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.
Key features:
- Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
- Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
- Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.
Use cases:
- Content structuring: Organize and structure content for e-learning platforms to improve usability.
- SEO optimization: Optimize content to rank higher in search engine results.
- Highlight creation: Create short, engaging video clips for media and broadcasting.
- Incident reporting: Record and report incidents for security and law enforcement purposes.
To understand how your usage is measured and billed, see the Pricing page.
Depending on your use case, follow the steps in one of the guides below:
Notes
-
Your prompts can be instructive or descriptive, or you can also phrase them as questions.
-
The platform generates text according to the model options enabled for your index, which determine the types of information the video understanding model processes.
Example:
- If both the
visual
andaudio
model options are enabled, the platform generates text based on both visual and audio information. - If only the
visual
option is enabled, the platform generates text based only on visual information. - The maximum length of a prompt is 2,000 tokens.
- If both the