Create sync embeddings

The Embed.V2 interface provides methods to create embeddings synchronously for multimodal content. This endpoint returns embeddings immediately in the response.

Note

This interface only supports Marengo version 3.0 or newer.

When to use this interface:

  • Create embeddings for text, images, audio, or video content
  • Get immediate results without waiting for background processing
  • Process audio or video content up to 10 minutes in duration

Do not use this interface for:

  • Audio or video content longer than 10 minutes. Use the embed.v2.create method instead.

Methods

Create embeddings

Description: This method synchronously creates embeddings for multimodal content and returns the results immediately in the response.

Text:

  • Maximum length: 500 tokens

Images:

  • Formats: JPEG, PNG
  • Minimum size: 128x128 pixels
  • Maximum file size: 5 MB

Audio and video:

  • Maximum duration: 10 minutes
  • Maximum file size for base64 encoded strings: 36 MB
  • Audio formats: WAV (uncompressed), MP3 (lossy), FLAC (lossless)
  • Video formats: FFmpeg supported formats
  • Video resolution: 360x360 to 5184x2160 pixels
  • Aspect ratio: Between 1:1 and 1:2.4, or between 2.4:1 and 1:1
Note

This method is rate-limited. For details, see the Rate limits page.

Function signature and example:

1create(
2 request: TwelvelabsApi.embed.CreateEmbeddingsRequest,
3 requestOptions?: V2.RequestOptions
4): core.HttpResponsePromise<TwelvelabsApi.EmbeddingSuccessResponse>

Parameters:

NameTypeRequiredDescription
requestTwelvelabsApi.embed.CreateEmbeddingsRequestYesParameters for creating embeddings.
requestOptionsV2.RequestOptionsNoRequest-specific configuration.

The TwelvelabsApi.embed.CreateEmbeddingsRequest interface has the following properties:

NameTypeRequiredDescription
inputTypeTwelvelabsApi.embed.CreateEmbeddingsRequestInputTypeYesThe type of content for the embeddings. Values: text, image, text_image, audio, video.
modelNameTwelvelabsApi.embed.CreateEmbeddingsRequestModelNameYesThe video understanding model you wish to use. Value: marengo3.0.
textTwelvelabsApi.TextInputRequestNoText input configuration. Required when inputType is text. See TextInputRequest for details.
imageTwelvelabsApi.ImageInputRequestNoImage input configuration. Required when inputType is image. See ImageInputRequest for details.
textImageTwelvelabsApi.TextImageInputRequestNoCombined text and image input configuration. Required when inputType is text_image. See TextImageInputRequest for details.
audioTwelvelabsApi.AudioInputRequestNoAudio input configuration. Required when inputType is audio. See AudioInputRequest for details.
videoTwelvelabsApi.VideoInputRequestNoVideo input configuration. Required when inputType is video. See VideoInputRequest for details.

TextInputRequest

The TwelvelabsApi.TextInputRequest interface specifies configuration for processing text content. Required when inputType is text.

NameTypeRequiredDescription
inputTextstringYesThe text for which you wish to create an embedding. The maximum length is 500 tokens.

ImageInputRequest

The TwelvelabsApi.ImageInputRequest interface specifies configuration for processing image content. Required when inputType is image.

NameTypeRequiredDescription
mediaSourceTwelvelabsApi.MediaSourceYesSpecifies the source of the image file. See MediaSource for details.

TextImageInputRequest

The TwelvelabsApi.TextImageInputRequest interface specifies configuration for processing combined text and image content. Required when inputType is text_image.

NameTypeRequiredDescription
mediaSourceTwelvelabsApi.MediaSourceYesSpecifies the source of the image file. See MediaSource for details.
inputTextstringYesThe text for which you wish to create an embedding. The maximum length is 500 tokens.

AudioInputRequest

The TwelvelabsApi.AudioInputRequest interface specifies configuration for processing audio content. Required when inputType is audio.

NameTypeRequiredDescription
mediaSourceTwelvelabsApi.MediaSourceYesSpecifies the source of the audio file. See MediaSource for details.
startSecnumberNoThe start time in seconds for processing the audio file. Use this parameter to process a portion of the audio file starting from a specific time. Default: 0 (start from the beginning).
endSecnumberNoThe end time in seconds for processing the audio file. Use this parameter to process a portion of the audio file ending at a specific time. The end time must be greater than the start time. Default: End of the audio file.
segmentationTwelvelabsApi.AudioSegmentationNoSpecifies how the platform divides the audio into segments. When combined with embeddingScope=["clip"], creates separate embeddings for each segment. Use this to generate embeddings for specific portions of your audio. See AudioSegmentation for details.
embeddingOptionTwelvelabsApi.AudioInputRequestEmbeddingOptionItem[]NoThe types of embeddings you wish to generate. Values:
- audio: Generates embeddings based on audio content (sounds, music, effects)
- transcription: Generates embeddings based on transcribed speech

You can specify multiple options to generate different types of embeddings for the same audio.
embeddingScopeTwelvelabsApi.AudioInputRequestEmbeddingScopeItem[]NoThe scope for which you wish to generate embeddings. Values:
- clip: Generates one embedding for each segment
- asset: Generates one embedding for the entire audio file

You can specify multiple scopes to generate embeddings at different levels.

VideoInputRequest

The TwelvelabsApi.VideoInputRequest interface specifies configuration for processing video content. Required when inputType is video.

NameTypeRequiredDescription
mediaSourceTwelvelabsApi.MediaSourceYesSpecifies the source of the video file. See MediaSource for details.
startSecnumberNoThe start time in seconds for processing the video file. Use this parameter to process a portion of the video file starting from a specific time. Default: 0 (start from the beginning).
endSecnumberNoThe end time in seconds for processing the video file. Use this parameter to process a portion of the video file ending at a specific time. The end time must be greater than the start time. Default: End of the video file.
segmentationTwelvelabsApi.VideoSegmentationNoSpecifies how the platform divides the video into segments. When combined with embeddingScope=["clip"], creates separate embeddings for each segment. Supports fixed-duration segments or dynamic segmentation that adapts to scene changes. See VideoSegmentation for details.
embeddingOptionTwelvelabsApi.VideoInputRequestEmbeddingOptionItem[]NoThe types of embeddings to generate for the video. Values:
- visual: Generates embeddings based on visual content (scenes, objects, actions)
- audio: Generates embeddings based on audio content (sounds, music, effects)
- transcription: Generates embeddings based on transcribed speech

You can specify multiple options to generate different types of embeddings for the same video. Default: ["visual", "audio", "transcription"].
embeddingScopeTwelvelabsApi.VideoInputRequestEmbeddingScopeItem[]NoThe scope for which you wish to generate embeddings. Values:
- clip: Generates one embedding for each segment
- asset: Generates one embedding for the entire video file. Use this scope for videos up to 10-30 seconds to maintain optimal performance.

You can specify multiple scopes to generate embeddings at different levels. Default: ["clip", "asset"].

MediaSource

The TwelvelabsApi.MediaSource interface specifies the source of the media file. Provide exactly one of the following:

NameTypeRequiredDescription
base64StringstringNoThe base64-encoded media data.
urlstringNoThe publicly accessible URL of the media file. Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported.
assetIdstringNoThe unique identifier of an asset from a direct or multipart upload.

AudioSegmentation

The TwelvelabsApi.AudioSegmentation interface specifies how the platform divides the audio into segments using fixed-length intervals.

NameTypeRequiredDescription
strategy"fixed"YesThe segmentation strategy. Value: fixed.
fixedTwelvelabsApi.AudioSegmentationFixedYesConfiguration for fixed segmentation. See AudioSegmentationFixed for details.

AudioSegmentationFixed

The TwelvelabsApi.AudioSegmentationFixed interface configures fixed-length segmentation for audio.

NameTypeRequiredDescription
durationSecnumberYesThe duration in seconds for each segment. The platform divides the audio into segments of this exact length. The final segment may be shorter if the audio duration is not evenly divisible.

Example: With durationSec: 5, a 12-second audio file produces segments: [0-5s], [5-10s], [10-12s].

VideoSegmentation

The TwelvelabsApi.VideoSegmentation type specifies how the platform divides the video into segments. Use one of the following:

Fixed segmentation: Divides the video into equal-length segments:

NameTypeRequiredDescription
strategy"fixed"YesThe segmentation strategy. Value: fixed.
fixedTwelvelabsApi.VideoSegmentationFixedFixedYesConfiguration for fixed segmentation. See VideoSegmentationFixedFixed for details.

Dynamic segmentation: Divides the video into adaptive segments based on scene changes:

NameTypeRequiredDescription
strategy"dynamic"YesThe segmentation strategy. Value: dynamic.
dynamicTwelvelabsApi.VideoSegmentationDynamicDynamicYesConfiguration for dynamic segmentation. See VideoSegmentationDynamicDynamic for details.

VideoSegmentationFixedFixed

The TwelvelabsApi.VideoSegmentationFixedFixed interface configures fixed-length segmentation for video.

NameTypeRequiredDescription
durationSecnumberYesThe duration in seconds for each segment. The platform divides the video into segments of this exact length. The final segment may be shorter if the video duration is not evenly divisible.

Example: With durationSec: 5, a 12-second video produces segments: [0-5s], [5-10s], [10-12s].

VideoSegmentationDynamicDynamic

The TwelvelabsApi.VideoSegmentationDynamicDynamic interface configures dynamic segmentation for video based on scene changes.

NameTypeRequiredDescription
minDurationSecnumberYesThe minimum duration in seconds for each segment. The platform divides the video into segments that are at least this long. Segments adapt to scene changes and content boundaries and may be longer than the minimum.

Example: With minDurationSec: 3, segments might be: [0-3.2s], [3.2-7.8s], [7.8-12.1s].

Return value: Returns an HttpResponsePromise that resolves to a TwelvelabsApi.EmbeddingSuccessResponse object containing the embedding results.

The TwelvelabsApi.EmbeddingSuccessResponse interface contains the following properties:

NameTypeDescription
dataTwelvelabsApi.EmbeddingData[]Array of embedding results.
metadataTwelvelabsApi.EmbeddingMediaMetadataMetadata about the media content.

The TwelvelabsApi.EmbeddingData interface contains the following properties:

NameTypeDescription
embeddingnumber[]The embedding vector for the content.
embeddingOptionTwelvelabsApi.EmbeddingDataEmbeddingOptionThe type of embedding. Values: visual, audio, transcription.
embeddingScopeTwelvelabsApi.EmbeddingDataEmbeddingScopeThe scope of the embedding. Values: clip, asset.
startSecnumberThe start time in seconds for this embedding segment.
endSecnumberThe end time in seconds for this embedding segment.

API Reference: Create sync embeddings