The Embed.V2.Tasks interface provides methods to create embeddings asynchronously for audio and video content. Use this class for processing files longer than 10 minutes.

Note

This interface only supports Marengo version 3.0 or newer.

When to use this interface:

Process audio or video files longer than 10 minutes
Process files up to 4 hours in duration

Input requirements

Video:

Minimum duration: 4 seconds
Maximum duration: 4 hours
Maximum file size: 4 GB
Formats: FFmpeg supported formats
Resolution: 360x360 to 5184x2160 pixels
Aspect ratio: Between 1:1 and 1:2.4, or between 2.4:1 and 1:1

Audio:

Minimum duration: 4 seconds
Maximum duration: 4 hours
Maximum file size: 2 GB
Formats: WAV (uncompressed), MP3 (lossy), FLAC (lossless)

Creating embeddings asynchronously requires three steps:

Create a task using the create method. The platform returns a task ID.
Poll for the status of the task using the retrieve method. Wait until the status is ready.
Retrieve the embeddings from the response when the status is ready using the retrieve method.

Methods

List embedding tasks

Description: This method returns a list of the async embedding tasks in your account. The platform returns your async embedding tasks sorted by creation date, with the newest at the top of the list.

Function signature and example:

1 list(
2     request?: TwelvelabsApi.embed.v2.TasksListRequest,
3     requestOptions?: Tasks.RequestOptions
4 ): Promise<core.Page<TwelvelabsApi.MediaEmbeddingTask>>

Parameters

Name	Type	Required	Description
`request`	`TwelvelabsApi.embed.v2.TasksListRequest`	Yes	Parameters for listing embedding tasks.
`requestOptions`	`Tasks.RequestOptions`	No	Request-specific configuration.

The TwelvelabsApi.embed.v2.TasksListRequest interface has the following properties:

Name	Type	Required	Description
`startedAt`	`string`	No	Retrieve the embedding tasks that were created after the given date and time, expressed in the RFC 3339 format (“YYYY-MM-DDTHH:mm:ssZ”).
`endedAt`	`string`	No	Retrieve the embedding tasks that were created before the given date and time, expressed in the RFC 3339 format (“YYYY-MM-DDTHH:mm:ssZ”).
`status`	`string`	No	Filter the embedding tasks by their current status. Values: `processing`, `ready`, or `failed`.
`page`	`number`	No	A number that identifies the page to retrieve. Default: `1`.
`pageLimit`	`number`	No	The number of items to return on each page. Default: `10`. Max: `50`.

Return value

Returns a Promise that resolves to a Page<TwelvelabsApi.MediaEmbeddingTask> object that allows you to iterate through the paginated task results.

The Page<T> class contains the following properties and methods:

Name	Type	Description
`data`	`T[]`	An array containing the current page of items.
`hasNextPage()`	`boolean`	Returns whether there is a next page to load.
`getNextPage()`	`Promise<Page<T>>`	Retrieves the next page and returns the updated `Page` object.
`Symbol.asyncIterator`	`AsyncIterator<T>`	Allows iteration through all items across all pages using `for await` loops.

The TwelvelabsApi.MediaEmbeddingTask interface contains the following properties:

Name	Type	Description
`id`	`string`	The unique identifier of the embedding task.
`modelName`	`string`	The name of the video understanding model the platform used to create the embedding.
`status`	`string`	A string indicating the status of the embedding task. Values: `processing`, `ready`, or `failed`.
`createdAt`	`Date`	The date and time when the task was created.
`updatedAt`	`Date`	The date and time when the task was last updated.
`videoEmbedding`	`TwelvelabsApi.MediaEmbeddingTaskVideoEmbedding`	An object containing the metadata associated with the video embedding.
`audioEmbedding`	`TwelvelabsApi.MediaEmbeddingTaskAudioEmbedding`	An object containing the metadata associated with the audio embedding.

API Reference

List async embedding tasks

Create an async embedding task

Description: This method creates embeddings for audio and video content asynchronously.

Note

This method is rate-limited. For details, see the Rate limits page.

Function signature and example:

1 create(
2     request: TwelvelabsApi.embed.v2.CreateAsyncEmbeddingRequest,
3     requestOptions?: Tasks.RequestOptions
4 ): core.HttpResponsePromise<TwelvelabsApi.embed.v2.TasksCreateResponse>

Parameters

Name	Type	Required	Description
`request`	`TwelvelabsApi.embed.v2.CreateAsyncEmbeddingRequest`	Yes	Parameters for creating an async embedding task.
`requestOptions`	`Indexes.RequestOptions`	No	Request-specific configuration.

The TwelvelabsApi.embed.v2.CreateAsyncEmbeddingRequest interface has the following properties:

Name	Type	Required	Description
`inputType`	`TwelvelabsApi.embed.v2.CreateAsyncEmbeddingRequestInputType`	Yes	The type of content for the embeddings. Values: `audio`, `video`.
`modelName`	`TwelvelabsApi.embed.v2.CreateAsyncEmbeddingRequestModelName`	Yes	The model you wish to use. Value: `marengo3.0`.
`audio`	`TwelvelabsApi.AudioInputRequest`	No	Audio input configuration. Required when `inputType` is `audio`. See AudioInputRequest for details.
`video`	`TwelvelabsApi.VideoInputRequest`	No	Video input configuration. Required when `inputType` is `video`. See VideoInputRequest for details.

AudioInputRequest

The TwelvelabsApi.AudioInputRequest interface specifies the configuration for processing audio content. Required when inputType is audio.

Name	Type	Required	Description
`mediaSource`	`TwelvelabsApi.MediaSource`	Yes	Specifies the source of the audio file. See MediaSource for details.
`startSec`	`number`	No	The start time in seconds for processing the audio file. Use this parameter to process a portion of the audio file starting from a specific time. Default: `0` (start from the beginning).
`endSec`	`number`	No	The end time in seconds for processing the audio file. Use this parameter to process a portion of the audio file ending at a specific time. The end time must be greater than the start time. Default: End of the audio file.
`segmentation`	`TwelvelabsApi.AudioSegmentation`	No	Specifies how the platform divides the audio into segments. See AudioSegmentation for details.
`embeddingOption`	`TwelvelabsApi.AudioInputRequestEmbeddingOptionItem[]`	No	The types of embeddings you wish to generate. Values: - `audio`: Generates embeddings based on audio content (sounds, music, effects) - `transcription`: Generates embeddings based on transcribed speech You can specify multiple options to generate different types of embeddings for the same audio.
`embeddingScope`	`TwelvelabsApi.AudioInputRequestEmbeddingScopeItem[]`	No	The scope for which you wish to generate embeddings. Values: - `clip`: Generates one embedding for each segment - `asset`: Generates one embedding for the entire audio file You can specify multiple scopes to generate embeddings at different levels.
`embeddingType`	`TwelvelabsApi.AudioInputRequestEmbeddingTypeItem[]`	No	Specifies how to structure the embedding. Use this parameter only when `embeddingOption` specifies two or more values. Values: - `separate_embedding`: Returns separate embeddings per modality specified in `embeddingOption`. - `fused_embedding`: Returns a single embedding that combines all modalities into one vector. Specify both values to receive separate and fused embeddings in the same response. Default: `separate_embedding`

VideoInputRequest

The TwelvelabsApi.VideoInputRequest interface specifies the configuration for processing video content. Required when inputType is video.

Name	Type	Required	Description
`mediaSource`	`TwelvelabsApi.MediaSource`	Yes	Specifies the source of the video file. See MediaSource for details.
`startSec`	`number`	No	The start time in seconds for processing the video file. Use this parameter to process a portion of the video file starting from a specific time. Default: `0` (start from the beginning).
`endSec`	`number`	No	The end time in seconds for processing the video file. Use this parameter to process a portion of the video file ending at a specific time. The end time must be greater than the start time. Default: End of the video file.
`segmentation`	`TwelvelabsApi.VideoSegmentation`	No	Specifies how the platform divides the video into segments. See VideoSegmentation for details.
`embeddingOption`	`TwelvelabsApi.VideoInputRequestEmbeddingOptionItem[]`	No	The types of embeddings to generate for the video. Values: - `visual`: Generates embeddings based on visual content (scenes, objects, actions) - `audio`: Generates embeddings based on audio content (sounds, music, effects) - `transcription`: Generates embeddings based on transcribed speech You can specify multiple options to generate different types of embeddings for the same video. Default: `["visual", "audio", "transcription"]`.
`embeddingScope`	`TwelvelabsApi.VideoInputRequestEmbeddingScopeItem[]`	No	The scope for which you wish to generate embeddings. Values: - `clip`: Generates one embedding for each segment - `asset`: Generates one embedding for the entire video file. Use this scope for videos up to 10-30 seconds to maintain optimal performance. You can specify multiple scopes to generate embeddings at different levels. Default: `["clip", "asset"]`.
`embeddingType`	`TwelvelabsApi.VideoInputRequestEmbeddingTypeItem[]`	No	Specifies how to structure the embedding. Include this parameter only when `embeddingOption` contains at least two values. Values: - `separate_embedding`: Returns separate embeddings per modality specified in `embeddingOption`. - `fused_embedding`: Returns a single embedding that combines all modalities into one vector. Specify both values to receive separate and fused embeddings in the same response. Default: `separate_embedding`

MediaSource

The TwelvelabsApi.MediaSource interface specifies the source of the media file. Provide exactly one of the following:

Name	Type	Required	Description
`base64String`	`string`	No	The base64-encoded media data.
`url`	`string`	No	The publicly accessible URL of the media file. Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported.
`assetId`	`string`	No	The unique identifier of an asset from a direct or multipart upload.

AudioSegmentation

The TwelvelabsApi.AudioSegmentation interface specifies how the platform divides the audio into segments using fixed-length intervals.

Name	Type	Required	Description
`strategy`	`TwelvelabsApi.AudioSegmentationStrategy`	Yes	The segmentation strategy. Value: `fixed`.
`fixed`	`TwelvelabsApi.AudioSegmentationFixed`	Yes	Configuration for fixed segmentation. See AudioSegmentationFixed for details.

AudioSegmentationFixed

The TwelvelabsApi.AudioSegmentationFixed interface configures fixed-length segmentation for audio.

Name	Type	Required	Description
`durationSec`	`number`	Yes	The duration in seconds for each segment. The platform divides the audio into segments of this exact length. The final segment may be shorter if the audio duration is not evenly divisible. Example: With `durationSec: 5`, a 12-second audio file produces segments: [0-5s], [5-10s], [10-12s].

VideoSegmentation

The TwelvelabsApi.VideoSegmentation type specifies how the platform divides the video into segments. Use one of the following:

Fixed segmentation: Divides the video into equal-length segments:

Name	Type	Required	Description
`strategy`	`"fixed"`	Yes	The segmentation strategy. Value: `fixed`.
`fixed`	`TwelvelabsApi.VideoSegmentationFixedFixed`	Yes	Configuration for fixed segmentation. See VideoSegmentationFixedFixed for details.

Dynamic segmentation: Divides the video into adaptive segments based on scene changes:

Name	Type	Required	Description
`strategy`	`"dynamic"`	Yes	The segmentation strategy. Value: `dynamic`.
`dynamic`	`TwelvelabsApi.VideoSegmentationDynamicDynamic`	Yes	Configuration for dynamic segmentation. See VideoSegmentationDynamicDynamic for details.

VideoSegmentationFixedFixed

The TwelvelabsApi.VideoSegmentationFixedFixed interface configures fixed-length segmentation for video.

Name	Type	Required	Description
`durationSec`	`number`	Yes	The duration in seconds for each segment. The platform divides the video into segments of this exact length. The final segment may be shorter if the video duration is not evenly divisible. Example: With `durationSec: 5`, a 12-second video produces segments: [0-5s], [5-10s], [10-12s].

VideoSegmentationDynamicDynamic

The TwelvelabsApi.VideoSegmentationDynamicDynamic interface configures dynamic segmentation for video based on scene changes.

Name	Type	Required	Description
`minDurationSec`	`number`	Yes	The minimum duration in seconds for each segment. The platform divides the video into segments that are at least this long. Segments adapt to scene changes and content boundaries and may be longer than the minimum. Example: With `minDurationSec: 3`, segments might be: [0-3.2s], [3.2-7.8s], [7.8-12.1s].

Return value

Returns an HttpResponsePromise that resolves to a TwelvelabsApi.embed.v2.TasksCreateResponse object containing the task details.

The TwelvelabsApi.embed.v2.TasksCreateResponse.TasksCreateResponse interface contains the following properties:

Name	Type	Description
`id`	`string`	The unique identifier of the embedding task.
`status`	`TasksCreateResponseStatus`	The initial status of the embedding task. Value: `processing`.
`data`	`TwelvelabsApi.EmbeddingData[]`	Array of embedding results (only when status is ready).

API Reference

Create an async embedding task

Retrieve task status and results

Description: This method retrieves the status and the results of an async embedding task.

Task statuses:

processing: The platform is creating the embeddings.
ready: Processing is complete. Embeddings are available in the response.
failed: The task failed. Embeddings were not created.

Invoke this method repeatedly until the status field is ready. When status is ready, use the embeddings from the response.

Function signature and example:

1 retrieve(
2     taskId: string,
3     requestOptions?: Tasks.RequestOptions
4 ): core.HttpResponsePromise<TwelvelabsApi.EmbeddingTaskResponse>

Parameters

Name	Type	Required	Description
`taskId`	`string`	Yes	The unique identifier of the embedding task.
`requestOptions`	`Tasks.RequestOptions`	No	Request-specific configuration.

Return value

Returns an HttpResponsePromise that resolves to a TwelvelabsApi.EmbeddingTaskResponse object containing the task status and results.

The TwelvelabsApi.EmbeddingTaskResponse interface contains the following properties:

Name	Type	Description
`id`	`string`	The unique identifier of the embedding task.
`status`	`TwelvelabsApi.EmbeddingTaskResponseStatus`	The current status of the task. Values: - `processing`: The platform is creating the embeddings - `ready`: Processing is complete. Embeddings are available in the `data` field - `failed`: The task failed. The `data` field is `null`
`createdAt`	`Date`	The date and time when the task was created.
`updatedAt`	`Date`	The date and time when the task was last updated.
`data`	`TwelvelabsApi.EmbeddingData[]`	An array of embedding results. The platform returns this field when `status` is `ready`.
`metadata`	`TwelvelabsApi.EmbeddingTaskMediaMetadata`	Metadata about the embedding task.

The TwelvelabsApi.EmbeddingData interface contains the following properties:

Name	Type	Description
`embedding`	`number[]`	The embedding vector for the content.
`embeddingOption`	`TwelvelabsApi.EmbeddingDataEmbeddingOption`	The type of embedding. Values: - `visual`: Embedding based on visual content (video only) -`audio`: Embedding based on audio content -`transcription`: Embedding based on transcribed speech - `fused`: Embedding based on a combination of the modalities specified in the request. The platform returns this embedding only for video and audio content, and only when the `embeddingType` parameter in the request includes `fused_embedding` - `null`: For text and image embeddings
`embeddingScope`	`TwelvelabsApi.EmbeddingDataEmbeddingScope`	The scope of the embedding. Values: `clip`, `asset`.
`startSec`	`number`	The start time in seconds for this embedding segment.
`endSec`	`number`	The end time in seconds for this embedding segment.

API Reference

Retrieve task status and results