Analyze videos

The Analyze API suite uses a multimodal approach to analyze videos and generate text, processing visuals, sounds, spoken words, and texts to provide a comprehensive understanding. This method captures nuances that unimodal interpretations might miss, allowing for accurate and context-rich text generation based on video content.

Note

This API was formerly known as the Generate API. The name has been updated to Analyze API to more accurately reflect its purpose of analyzing videos to generate text. You may continue using the /generate endpoint until July 30, 2025. After this date, you must use the /analyze endpoint.

Key features:

  • Multimodal analysis: Processes visuals, sounds, spoken words, and texts for a holistic understanding of video content.
  • Customizable prompts: Allows tailored outputs through instructive, descriptive, or question-based prompts.
  • Flexible text generation: Supports various tasks, including summarization, chaptering, and open-ended text generation.

Use cases:

  • Content structuring: Organize and structure content for e-learning platforms to improve usability.
  • SEO optimization: Optimize content to rank higher in search engine results.
  • Highlight creation: Create short, engaging video clips for media and broadcasting.
  • Incident reporting: Record and report incidents for security and law enforcement purposes.

To understand how your usage is measured and billed, see the Pricing page.

Depending on your use case, follow the steps in one of the guides below:

Notes
  • Your prompts can be instructive or descriptive, or you can also phrase them as questions.

  • The platform generates text according to the model options enabled for your index, which determine the types of information the video understanding model processes.

    Example:

    • If both the visual and audio model options are enabled, the platform generates text based on both visual and audio information.
    • If only the visual option is enabled, the platform generates text based only on visual information.
    • The maximum length of a prompt is 2,000 tokens.