Create embeddings
Use the Marengo video understanding model to generate embeddings from video, audio, text, and image inputs. These embeddings enable similarity search, content clustering, recommendation systems, and other machine learning applications.
Note
AWS Bedrock supports Marengo 3.0 and Marengo 2.7. Marengo 2.7 will be deprecated in a future release. Migrate to Marengo 3.0 to ensure continued support and access to new features. For details, see the Migration guide page.
For the key enhancements in Marengo 3.0, see the TwelveLabs Marengo Embed 3.0 page in the AWS Bedrock documentation.
Regional availability
Marengo is available in the following regions: US East (N. Virginia), Europe (Ireland), Asia Pacific (Seoul)
Model specification
The model has two types of limits: the maximum input size you can submit and the portion of content that it embeds.
Input requirements
This table shows the maximum size for each type of input:
Embedding coverage per input type
This table shows what portion of your input the model processes into embeddings:
Pricing
For details on pricing, see the Amazon Bedrock pricing page.
Choose the processing method
Select the processing method based on your use case and performance requirements. Synchronous processing returns embeddings immediately in the API response, while asynchronous processing handles larger files and batch operations by saving results to S3.
Note
Synchronous processing supports text and image inputs. Asynchronous processing supports video, audio, and image inputs.
Use synchronous processing to:
- Build real-time applications like chatbots, search, and recommendation systems.
- Enable interactive features that require immediate results.
Use asynchronous processing to:
- Build applications that process video, audio, and image files.
- Run batch operations and background workflows.
Prerequisites
Before you start, ensure you have the following:
- An AWS account with access to a region where the TwelveLabs models are supported.
- An AWS IAM principal with sufficient Amazon Bedrock permissions. For details on setting permissions, see the Identity and access management for Amazon Bedrock page.
- S3 permissions to read input files and write output files for Marengo operations.
- The AWS CLI and configured with your credentials.
- Python 3.7 or later with the
boto3library. - Access to the model you want to use. Navigate to the AWS Console > Bedrock > Model Access page and request access. Note that the availability of the models varies by region.
Create embeddings
Marengo supports base64 encoded strings and S3 URIs for media input. Note that the base64 method has a 36MB file size limit. This guide uses S3 URIs.
Note
Your S3 input and output buckets must be in the same region as the model. If regions don’t match, the API returns a ValidationException error.
To generate embeddings from your content, you use one of two Amazon Bedrock APIs, depending on your processing needs.
Synchronous processing
The InvokeModel API processes your request synchronously and returns embeddings directly in the response.
The InvokeModel API requires two parameters:
modelId: The inference profile ID for the model.body: A JSON-encoded string containing your input parameters.
The request body contains the following fields:
inputType: The type of content. Values: “text”, “image”, or “text_image”.- For text inputs, include a string named
inputTextwith the text to embed. - For image inputs, include an object named
imagewith with the following fields:mediaSource: The image source containing eitherbase64Stringors3Location
- For text with image inputs, include an object named
text_imagewith the following fields:inputText: The text to embedmediaSource: The image source containing eitherbase64Stringors3Location
Examples
Text
Image
Text with image
Ensure you replace the placeholders surrounded by <> with your values.
Asynchronous processing
The StartAsyncInvoke API processes your request asynchronously, storing the results in your S3 bucket.
To create embeddings asynchronously, you must complete the following steps:
The StartAsyncInvoke API requires three parameters:
modelId: The model ID.modelInput: A dictionary containing your input parameters.outputDataConfig: A dictionary specifying where to save the results
The modelInput dictionary contains the following required fields:
inputType: The type of content (“video”, “audio”, “image”, “text”, or “text_image”)- For video inputs, include an object named
videocontaining at least the following fields:mediaSource: The S3 location of your video file
- For audio inputs, include an object named
audiocontaining at least the following required fields:mediaSource: The S3 location of your audio file
- For image inputs, include an
imageobject with:mediaSource: The S3 location of your image file
- For text inputs, include a
textobject with:inputText: The text to embed- For text with image inputs, include a
text_imageobject with: inputText: The text to embedmediaSource: The S3 location of your image file
S3 output structure
Each invocation creates a unique directory in your S3 bucket with two files:
manifest.json: Contains metadata including the request ID.output.json: Contains the actual embeddings.
Examples
Video
Text inputs
Audio inputs
Image inputs
Text and image inputs
Ensure you replace the placeholders surrounded by <> with your values.
Use embeddings
After generating embeddings, you can store them in a vector database for efficient similarity search and retrieval.
The typical workflow is as follows:
Request parameters and response fields
For a complete list of request parameters and response fields, see the TwelveLabs Marengo Embed 3.0 page in the Amazon Bedrock documentation.
Using Marengo 2.7
Note
Marengo 2.7 will be deprecated in a future release. Migrate to Marengo 3.0 to ensure continued support and access to new features. For details, see the Migration guide page.
Synchronous processing
Request body structure
The request body contains the following fields:
inputType: The type of content. Values: “text” or “image”.inputText: The text to embed. Required for text inputs.mediaSource: The image source containing eitherbase64Stringors3Location. Required for image inputs.
Examples
Text
Image
Replace <YOUR_TEXT> with the text for which you wish to create an embedding.
Asynchronous processing
Model input structure
The modelInput dictionary contains the following fields:
inputType: The type of content (“video”, “audio”, “image”, or “text”)mediaSource: The S3 location of your input file (for video, audio, and image)inputText: The text content (for text inputs only)
Examples
Video, audio, or image
Text
Replace the following placeholders with your values:
<YOUR_REGION>: Your AWS region<YOUR_ACCOUNT_ID>: Your AWS account ID<YOUR_BUCKET_NAME>: The name of your S3 bucket<YOUR_FILE>: The name of your file<YOUR_INPUT_TYPE>: The type of media (“video”, “audio”, or “image”)
Request parameters and response fields
For a complete list of request parameters and response fields, see the TwelveLabs Marengo Embed 2.7 page in the Amazon Bedrock documentation.