For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sample appsIntegrationsDiscordPlaygroundDevEx repo
GuidesSDK ReferenceAPI Reference
GuidesSDK ReferenceAPI Reference
  • Get Started
    • Introduction
    • Quickstart
    • Manage your plan
    • Rate limits
    • Release notes
    • Migration guide
  • Guides
    • Search
    • Analyze videos
    • Segment videos
    • Create embeddings
  • Concepts
    • Models
    • Upload and processing methods
    • Indexes
    • Modalities
    • Multimodal large language models
  • Cloud partner integrations
    • Amazon Bedrock
  • Advanced
    • Organizations
    • Fine-tuning
    • Webhooks
    • Metadata
    • Model context protocol
    • Claude Code Plugin
  • Resources
    • Platform overview
    • Playground
    • TwelveLabs SDKs
    • Frequently asked questions
    • Use cases
    • Sample applications
    • Partner integrations
    • From the community
LogoLogo
Sample appsIntegrationsDiscordPlaygroundDevEx repo
On this page
  • Model options
  • Related topics
  • Search options
  • Transcription options
  • Combine multiple modalities
  • Related topics
  • Embedding options
  • Related topics
Concepts

Modalities

Was this page helpful?
Previous

Multimodal large language models

Next
Built with

Modalities represent the sources of information that the platform processes and analyzes in a video.

Visual includes:

  • Actions, objects, and events in the video.
  • Text that appears on screen (through OCR).
  • Brand logos and visual elements.

Audio includes:

  • Ambient sounds, music, and sound effects.
  • Non-speech audio only. For speech content, use the transcription modality.

Transcription includes:

  • Spoken words extracted from the audio track.

You specify modalities through different parameters depending on your task:

  • Model options: when you create an index.
  • Search options: when you search videos.
  • Embedding option: when you retrieve embeddings.

Model options

When you create an index, specify which modalities the platform must process. You can include the following values in the model_options array:

  • visual: To process visual content
  • audio: To process audio content

You can enable one or both model options. The platform processes only the modalities you specify.

Related topics

  • Python SDK Reference > Create an index
  • Node.js SDK Reference > Create an index
  • API Reference > Create an index

Search options

When you search videos, use the search_options parameter specify which modalities the platform uses to find relevant matches.

Marengo separates audio into speech and non-speech content.

To find visual content:

Set search_options to visual to search for:

  • Actions, objects, and events in the video
  • Text that appears on screen (through OCR)
  • Brand logos and visual elements

Example use cases:

  • Finding scenes with specific objects: “red car in parking lot”
  • Locating on-screen text: “company logo on building”
  • Identifying actions: “person running”

To find non-speech audio:

Set search_options to audio to search for sounds other than human speech:

  • Musical tones and melodies
  • Beeping, alarms, and mechanical sounds
  • Environmental sounds (rain, traffic, nature)

Example use cases:

  • Finding background music: “upbeat electronic music”
  • Locating sound effects: “door slamming”
  • Identifying ambient sounds: “rainfall”

Find spoken words

Set search_options to transcription to search the spoken content in your videos.

Example use cases:

  • Finding mentions of topics: “climate change discussion”
  • Locating product names: “iPhone 15 Pro Max”
  • Identifying speakers discussing concepts: “quarterly revenue growth”

Transcription options

Use the transcription_options parameter to specify how the platform matches your query against spoken words:

  • lexical: Matches the exact words or phrases in your query, allowing for minor spelling variations.
  • semantic: Matches the meaning of your query, even when the spoken words differ.

Exact word matching (lexical)

  • Matches the specific words or phrases in your query
  • Allows for minor spelling variations

Best for: Product names, technical terminology, proper nouns.

Meaning-based matching (semantic)

  • Matches the meaning of your query, even with different wording
  • Finds conceptually similar content

Best for: General concepts, topics that can be expressed in multiple ways.

Using both methods (default)

  • Specify both lexical and semantic, or omit transcription_options entirely
  • Returns the broadest set of results

Best for: Comprehensive searches where you want both exact matches and related content.

Combine multiple modalities

You can search across multiple modalities simultaneously by specifying multiple values for the search_options parameter. Control how results are combined using the operator parameter.

search_optionsoperatortranscription_optionsResult
["visual", "transcription"]orlexicalProduct shown OR exact name spoken
visual, transcriptionandlexicalProduct shown WHILE exact name spoken
visual, transcriptionorsemanticProduct shown OR discussed (any wording)
visual, transcriptionandsemanticProduct shown WHILE discussed
visual, audioorN/AVisuals OR sounds (non-speech)
visual, audioandN/AVisuals WITH sounds together
visual, audio, transcriptionorBothAny modality matches
visual, audio, transcriptionandBothAll modalities match simultaneously

Related topics

  • Python SDK Reference > Make a search request
  • Node.js SDK Reference > Make a search request
  • API Reference > Make a search request

Embedding options

When you create video embeddings, specify the types of embeddings the platform must return. You can include the following values in the embedding_option array:

  • visual: To retrieve visual embeddings.
  • audio: To retrieve embeddings for non-verbal audio (musical tones, beeping, environmental sounds).
  • transcription: To retrieve embeddings for transcribed speech (the actual words spoken in the video).

Related topics

  • Python SDK Reference > Create sync embeddings
  • Python SDK Reference > Create an async embedding task
  • Node.js SDK Reference > Create sync embeddings
  • Node.js SDK Reference > Create an async embedding task
  • API Reference > Create sync embeddings
  • API Reference > Create an async embedding task