Text and image embeddings
This guide shows how you can create text and image embeddings using the Marengo video understanding model. For a list of available versions, complete specifications and input requirements for each version, see the Marengo page.
The Marengo video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
For details on how your usage is measured and billed, see the Pricing page.
Key concepts
This section explains the key concepts and terminology used in this guide:
- Asset: Your uploaded content
- Embedding: Vector representation of your content.
Workflow
To create text and image embeddings, provide your image and text content to the platform. You can upload your image files as assets, provide a publicly accessible URL, or use base64-encoded data. The platform combines both the visual content from your image and the semantic meaning from your text into a single vector representation. Use these embeddings for similarity search, content classification, clustering, recommendations, or building Retrieval-Augmented Generation (RAG) systems.
This guide demonstrates how to create embeddings by uploading your image file as an asset. This approach is the most flexible because you can reuse assets across multiple operations. Alternatively, you can provide a publicly accessible URL or base64-encoded image data inline to skip the upload step.
Prerequisites
-
To use the platform, you need an API key:
-
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
-
Your image files must meet the requirements.
Complete example
Copy and paste the code below, replacing the placeholders surrounded by <> with your values.
Code explanation
Python
Node.js
Import the SDK and initialize the client
Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:
api_key: The API key to authenticate your requests to the platform.
Return value: An object of type TwelveLabs configured for making API calls.
Upload an image
Upload an image file to create an asset. For details about the available upload methods and the corresponding limits, see the Upload methods page.
Function call: You call the assets.create function.
Parameters:
method: The upload method for your asset. Useurlfor a publicly accessible ordirectto upload a local file. This example usesurl.urlorfile: The publicly accessible URL of your image file or an opened file object in binary read mode. This example usesurl.
Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.
Create text and image embeddings
Function call: You call the embed.v_2.create function.
Parameters:
input_type: The type of content. Set this parameter totext_image.model_name: The model you want to use. This example usesmarengo3.0.text_image: ATextImageInputRequestobject containing the following properties:-
media_source: An object specifying the source of the image file. Specify one of the following:-
asset_id: The unique identifier of an asset from a previous upload. -
url: The publicly accessible URL of the image file. -
base_64_string: The base64-encoded image data.This example uses the asset ID from the previous step.
-
-
input_text: The text for which you wish to create an embedding.
-
Return value: An object of type EmbeddingSuccessResponse containing a field named data, which is a list of embedding objects. Each embedding object includes the following fields:
embedding: An array of floats representing the embedding vector.embedding_option: The type of embedding generated.