Modalities
Modalities represent the sources of information that the platform processes and analyzes in a video. Choose the modalities that match your needs:
Visual includes:
- Actions, objects, and events in the video.
- Text that appears on screen (through OCR).
- Brand logos and visual elements.
Audio includes:
- Ambient sounds, music, and sound effects.
- Human speech and conversations.
You specify modalities through different parameters depending on your task:
model_options
when you create an index.search options
when you search videos.embedding option
when you create embeddings.
Model options
When you create an index, specify which modalities the platform must process. You can include the following values in the model_options
array:
visual
: To process visual contentaudio
: To process audio content
You can enable one or both model options. The platform extracts and indexes only the modalities you specify.
Search options
When you search videos, specify which modalities the platform uses to find relevant matches. You can include the following values in the search_options
array:
visual
: To search visual contentaudio
: To search audio content.
Notes
- Search options must be a subset of the model options specified when the index was created. For example, if only the
visual
model option is enabled for your index, you cannot search using theaudio
search option. - You can combine multiple search options with the
operator
parameter to broaden or narrow your search.
Embedding options
When you create video embeddings, specify the types of embeddings the platform must return. You can include the following values in the embedding_option
array:
visual-text
: To retrieve visual embeddings optimized for text search.audio
: To retrieve audio embeddings.