Create text embeddings

This guide shows how you can create text embeddings.

The following table lists the available models for generating text embeddings and their key characteristics:

ModelDescriptionDimensionsMax tokensSimilarity metric
Marengo-retrieval-2.7Use this model to create embeddings that you can use in various downstream tasks102477Cosine similarity

The “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

To create text embeddings, invoke the create method of the embed class specifying the following parameters:

  • model_name: The name of the video understanding model you want to use.
  • text: The text for which you want to create an embedding.
  • (Optional) text_truncate: Specifies the behavior for text that exceeds 77 tokens. It can take one of the following values:
    • start: Truncate the beginning of the text.
    • end: Truncate the end of the text (default).
    • none: Return an error if the text exceeds the token limit.

The response is an object containing the following fields:

  • model_name: The name of the model the platform has used to create this embedding.
  • text_embedding: An object that contains the embedding.

For a description of each field in the request and response, see the Create embeddings for text, image, and audio page.

Prerequisites

  • You’re familiar with the concepts that are described on the Platform overview page.
  • You have an API key. To retrieve your API key, navigate to the API Key page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.

Example

The example code below creates a text embedding using the default behavior for handling text that is too long. Ensure you replace the placeholders surrounded by <> with your values.

1from twelvelabs import TwelveLabs
2from typing import List
3from twelvelabs.models.embed import SegmentEmbedding
4
5def print_segments(segments: List[SegmentEmbedding], max_elements: int = 5):
6 for segment in segments:
7 print(
8 f" embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
9 )
10 print(f" embeddings: {segment.embeddings_float[:max_elements]}")
11
12client = TwelveLabs(api_key="<YOUR_API_KEY>")
13
14res = client.embed.create(
15 model_name="Marengo-retrieval-2.7",
16 text="<YOUR_TEXT>",
17)
18
19print("Created a text embedding")
20print(f" Model: {res.model_name}")
21if res.text_embedding is not None and res.text_embedding.segments is not None:
22 print_segments(res.text_embedding.segments)

The output should look similar to the following one:

Created a text embedding
Model: Marengo-retrieval-2.7
embedding_scope=None start_offset_sec=None end_offset_sec=None
embeddings: [, -, 0, ., 0, (truncated)]
Built with