Create async embeddings

The EmbedClient.V2Client.TasksClient class provides methods to create embeddings asynchronously for audio and video content. Use this class for processing files longer than 10 minutes.

Note

This class only supports Marengo version 3.0 or newer.

When to use this class:

  • Process audio or video files longer than 10 minutes
  • Process files up to 4 hours in duration

Video:

  • Minimum duration: 4 seconds
  • Maximum duration: 4 hours
  • Maximum file size: 4 GB
  • Formats: FFmpeg supported formats
  • Resolution: 360x360 to 3840x2160 pixels
  • Aspect ratio: Between 1:1 and 1:2.4, or between 2.4:1 and 1:1

Audio:

  • Minimum duration: 4 seconds
  • Maximum duration: 4 hours
  • Maximum file size: 2 GB
  • Formats: WAV (uncompressed), MP3 (lossy), FLAC (lossless)

Creating embeddings asynchronously requires three steps:

  1. Create a task using the create method. The platform returns a task ID.
  2. Poll for the status of the task using the retrieve method. Wait until the status is ready.
  3. Retrieve the embeddings from the response when the status is ready using the retrieve method.

Methods

List embedding tasks

Description: This method returns a list of the async embedding tasks in your account. The platform returns your async embedding tasks sorted by creation date, with the newest at the top of the list.

Function signature and example:

1def list(
2 self,
3 *,
4 started_at: typing.Optional[str] = None,
5 ended_at: typing.Optional[str] = None,
6 status: typing.Optional[str] = None,
7 page: typing.Optional[int] = None,
8 page_limit: typing.Optional[int] = None,
9 request_options: typing.Optional[RequestOptions] = None,
10) -> SyncPager[MediaEmbeddingTask]

Parameters:

NameTypeRequiredDescription
started_atstrNoRetrieve the embedding tasks that were created after the specified date and time, expressed in the RFC 3339 format (“YYYY-MM-DDTHH:mm:ssZ”).
ended_atstrNoRetrieve the embedding tasks that were created before the specified date and time, expressed in the RFC 3339 format (“YYYY-MM-DDTHH:mm:ssZ”).
statusstrNoFilter the embedding tasks by their current status. Values: processing, ready, or failed.
pageintNoA number that identifies the page to retrieve. Default: 1.
page_limitintNoThe number of items to return on each page. Default: 10. Max: 50.
request_optionsRequestOptionsNoRequest-specific configuration.

Return value: Returns a SyncPager[MediaEmbeddingTask] object that allows you to iterate through the paginated task results.

The SyncPager[T] class contains the following properties and methods:

NameTypeDescription
itemsOptional[List[T]]A list containing the current page of items. Can be None.
has_nextboolIndicates whether there is a next page to load.
get_nextOptional[Callable[[], Optional[SyncPager[T]]]]A callable function that retrieves the next page. Can be None.
responseOptional[BaseHttpResponse]The HTTP response object. Can be None.
next_page()Optional[SyncPager[T]]Calls get_next() if available and returns the next page object.
__iter__()Iterator[T]Allows iteration through all items across all pages using for loops.
iter_pages()Iterator[SyncPager[T]]Allows iteration through page objects themselves.

The MediaEmbeddingTask class contains the following properties:

NameTypeDescription
idOptional[str]The unique identifier of the embedding task.
model_nameOptional[str]The name of the video understanding model the platform used to create the embedding.
statusOptional[str]A string indicating the status of the embedding task. Values: processing, ready, or failed.
created_atOptional[datetime]The date and time when the task was created.
updated_atOptional[datetime]The date and time when the task was last updated.
video_embeddingOptional[MediaEmbeddingTaskVideoEmbedding]An object containing the metadata associated with the video embedding.
audio_embeddingOptional[MediaEmbeddingTaskAudioEmbedding]An object containing the metadata associated with the audio embedding.

API Reference: List async embedding tasks

Create an async embedding task

Description: This method creates embeddings for audio and video content asynchronously.

Function signature and example:

1def create(
2 self,
3 *,
4 input_type: CreateAsyncEmbeddingRequestInputType,
5 model_name: str,
6 audio: typing.Optional[AudioInputRequest] = OMIT,
7 video: typing.Optional[VideoInputRequest] = OMIT,
8 request_options: typing.Optional[RequestOptions] = None,
9) -> TasksCreateResponse

Parameters:

NameTypeRequiredDescription
input_typeCreateAsyncEmbeddingRequestInputTypeYesThe type of content for the embedding Values: audio, video.
model_namestrYesThe model you wish to use. Example: marengo3.0.
audioAudioInputRequestNoAudio input configuration. Required when input_type is audio. See AudioInputRequest for details.
videoVideoInputRequestNoVideo input configuration. Required when input_type is video. See VideoInputRequest for details.
request_optionsRequestOptionsNoRequest-specific configuration.

AudioInputRequest

The AudioInputRequest class specifies configuration for processing audio content. Required when input_type is audio.

NameTypeRequiredDescription
media_sourceMediaSourceYesSpecifies the source of the audio file. See MediaSource for details.
start_secfloatNoThe start time in seconds for processing the audio file. Use this parameter to process a portion of the audio file starting from a specific time. Default: 0 (start from the beginning).
end_secfloatNoThe end time in seconds for processing the audio file. Use this parameter to process a portion of the audio file ending at a specific time. The end time must be greater than the start time. Default: End of the audio file.
segmentationAudioSegmentationNoSpecifies how the platform divides the audio into segments. See AudioSegmentation for details.
embedding_optionList[str]NoThe types of embeddings you wish to generate. Values:
- audio: Generates embeddings based on audio content (sounds, music, effects)
- transcription: Generates embeddings based on transcribed speech

You can specify multiple options to generate different types of embeddings for the same audio.
embedding_scopeList[str]NoThe scope for which you wish to generate embeddings. Values:
- clip: Generates one embedding for each segment
- asset: Generates one embedding for the entire audio file

You can specify multiple scopes to generate embeddings at different levels.

VideoInputRequest

The VideoInputRequest class specifies configuration for processing video content. Required when input_type is video.

NameTypeRequiredDescription
media_sourceMediaSourceYesSpecifies the source of the video file. See MediaSource for details.
start_secfloatNoThe start time in seconds for processing the video file. Use this parameter to process a portion of the video file starting from a specific time. Default: 0 (start from the beginning).
end_secfloatNoThe end time in seconds for processing the video file. Use this parameter to process a portion of the video file ending at a specific time. The end time must be greater than the start time. Default: End of the video file.
segmentationVideoSegmentationNoSpecifies how the platform divides the video into segments. See VideoSegmentation for details.
embedding_optionList[str]NoThe types of embeddings to generate for the video. Values:
- visual: Generates embeddings based on visual content (scenes, objects, actions)
- audio: Generates embeddings based on audio content (sounds, music, effects)
- transcription: Generates embeddings based on transcribed speech

You can specify multiple options to generate different types of embeddings for the same video. Default: ["visual", "audio", "transcription"].
embedding_scopeList[str]NoThe scope for which you wish to generate embeddings. Values:
- clip: Generates one embedding for each segment
- asset: Generates one embedding for the entire video file

You can specify multiple scopes to generate embeddings at different levels. Default: ["clip", "asset"].

MediaSource

The MediaSource class specifies the source of the media file. Provide exactly one of the following:

NameTypeRequiredDescription
base64_stringstrNoThe base64-encoded media data.
urlstrNoThe publicly accessible URL of the media file. Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported.
asset_idstrNoThe unique identifier of an asset from a direct or multipart upload.

AudioSegmentation

The AudioSegmentation class specifies how the platform divides the audio into segments using fixed-length intervals.

NameTypeRequiredDescription
strategyLiteral["fixed"]YesThe segmentation strategy. Value: fixed.
fixedAudioSegmentationFixedYesConfiguration for fixed segmentation. See AudioSegmentationFixed for details.

AudioSegmentationFixed

The AudioSegmentationFixed class configures fixed-length segmentation for audio.

NameTypeRequiredDescription
duration_secintYesThe duration in seconds for each segment. The platform divides the audio into segments of this exact length. The final segment may be shorter if the audio duration is not evenly divisible.

Example: With duration_sec: 5, a 12-second audio file produces segments: [0-5s], [5-10s], [10-12s].

VideoSegmentation

The VideoSegmentation type specifies how the platform divides the video into segments. Use one of the following:

Fixed segmentation: Divides the video into equal-length segments:

NameTypeRequiredDescription
strategyLiteral["fixed"]YesThe segmentation strategy. Value: fixed.
fixedVideoSegmentationFixedFixedYesConfiguration for fixed segmentation. See VideoSegmentationFixedFixed for details.

Dynamic segmentation: Divides the video into adaptive segments based on scene changes:

NameTypeRequiredDescription
strategyLiteral["dynamic"]YesThe segmentation strategy. Value: dynamic.
dynamicVideoSegmentationDynamicDynamicYesConfiguration for dynamic segmentation. See VideoSegmentationDynamicDynamic for details.

VideoSegmentationFixedFixed

The VideoSegmentationFixedFixed class configures fixed-length segmentation for video.

NameTypeRequiredDescription
duration_secintYesThe duration in seconds for each segment. The platform divides the video into segments of this exact length. The final segment may be shorter if the video duration is not evenly divisible.

Example: With duration_sec: 5, a 12-second video produces segments: [0-5s], [5-10s], [10-12s].

VideoSegmentationDynamicDynamic

The VideoSegmentationDynamicDynamic class configures dynamic segmentation for video based on scene changes.

NameTypeRequiredDescription
min_duration_secintYesThe minimum duration in seconds for each segment. The platform divides the video into segments that are at least this long. Segments adapt to scene changes and content boundaries and may be longer than the minimum.

Example: With min_duration_sec: 3, segments might be: [0-3.2s], [3.2-7.8s], [7.8-12.1s].

Return value: Returns a TasksCreateResponse object containing the task details.

The TasksCreateResponse class contains the following properties:

NameTypeDescription
idstrThe unique identifier of the embedding task.
statusLiteral["processing"]The initial status of the embedding task. Value: processing.
dataOptional[List[EmbeddingData]]Array of embedding results (only when status is ready).

API Reference: Create an async embedding task

Retrieve task status and results

Description: This method retrieves the status and the results of an async embedding task.

Task statuses:

  • processing: The platform is creating the embeddings.
  • ready: Processing is complete. Embeddings are available in the response.
  • failed: The task failed. Embeddings were not created.

Invoke this method repeatedly until the status field is ready. When status is ready, use the embeddings from the response.

Function signature and example:

1def retrieve(
2 self,
3 task_id: str,
4 *,
5 request_options: typing.Optional[RequestOptions] = None
6) -> EmbeddingTaskResponse

Parameters:

NameTypeRequiredDescription
task_idstrYesThe unique identifier of the embedding task.
request_optionsRequestOptionsNoRequest-specific configuration.

Return value: Returns an EmbeddingTaskResponse object containing the task status and results.

The EmbeddingTaskResponse class contains the following properties:

NameTypeDescription
idstrThe unique identifier of the embedding task.
statusEmbeddingTaskResponseStatusThe current status of the task. Values:
- processing: The platform is creating the embeddings
- ready: Processing is complete. Embeddings are available in the data field
- failed: The task failed. The data field is null
created_atOptional[datetime]The date and time when the task was created.
updated_atOptional[datetime]The date and time when the task was last updated.
dataOptional[List[EmbeddingData]]An array of embedding results. The platform returns this field when status is ready.
metadataOptional[EmbeddingTaskMediaMetadata]Metadata about the embedding task.

For details about the EmbeddingData class, see the Create an async embedding task section.

API Reference: Retrieve task status and results