Modalities

Modalities represent the sources of information that the platform processes and analyzes in a video. Choose the modalities that match your needs:

Visual includes:

  • Actions, objects, and events in the video.
  • Text that appears on screen (through OCR).
  • Brand logos and visual elements.

Audio includes:

  • Ambient sounds, music, and sound effects.
  • Human speech and conversations.

You specify modalities through different parameters depending on your task:

  • model_options when you create an index.
  • search options when you search videos.
  • embedding option when you create embeddings.

Model options

When you create an index, specify which modalities the platform must process. You can include the following values in the model_options array:

  • visual: To process visual content
  • audio: To process audio content

You can enable one or both model options. The platform extracts and indexes only the modalities you specify.

Search options

When you search videos, specify which modalities the platform uses to find relevant matches. You can include the following values in the search_options array:

  • visual: To search visual content
  • audio: To search audio content.
Notes
  • Search options must be a subset of the model options specified when the index was created. For example, if only the visual model option is enabled for your index, you cannot search using the audio search option.
  • You can combine multiple search options with the operator parameter to broaden or narrow your search.

Embedding options

When you create video embeddings, specify the types of embeddings the platform must return. You can include the following values in the embedding_option array:

  • visual-text: To retrieve visual embeddings optimized for text search.
  • audio: To retrieve audio embeddings.