Modalities

Modalities represent the types of information that the platform processes and analyzes in a video. These modalities are central to both indexing and searching video content.

The platform supports the following modalities:

  • Visual: Analyzes visual content in a video, including actions, objects, events, text (through Optical Character Recognition, or OCR), and brand logos.
  • Audio: Analyzes audio content in a video, including ambient sounds, music, and human speech.

Model options

When you create an index, you must specify the modalities that the platform processes. This determines what information is extracted and indexed from your videos. You can enable one or both modalities, depending on your needs.

See the Create indexes page for details on selecting the desired models and model options.

Search options

When you perform a search, you must specify the modalities that the video understanding model uses to find relevant information.

Constraints:

  • Search options must be a subset of the model options specified when the index was created. For example, if only the visual modality was enabled during indexing, you cannot search using the audio modality.
  • You can combine multiple search options with operators to broaden or narrow your search.

For examples on using search options, see the Text queries page.