Create text embeddings

This guide shows how you can create text embeddings.

The following table lists the available models for generating text embeddings and their key characteristics:

ModelDescriptionDimensionsMax tokensSimilarity metric
Marengo-retrieval-2.6Use this model to create embeddings that you can use in various downstream tasks102477Cosine similarity

The “Marengo-retrieval-2.6” video understanding engine generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

To create text embeddings, invoke the create method of the embed class specifying the following parameters:

  • engine_name: The name of the video understanding engine you want to use.
  • text: The text for which you want to create an embedding.
  • (Optional) text_truncate: Specifies the behavior for text that exceeds 77 tokens. It can take one of the following values:
    • start: Truncate the beginning of the text.
    • end: Truncate the end of the text (default).
    • none: Return an error if the text exceeds the token limit.

The response is an object containing the following fields:

  • engine_name: The name of the engine the platform has used to create this embedding.
  • text_embedding: An object that contains the embedding.

For a description of each field in the request and response, see the Create embeddings for text, image, and audio page.

Prerequisites

  • You’re familiar with the concepts that are described on the Platform overview page.
  • To use the platform, you need an API key:
    1

    If you don’t have an account, sign up for a free account.

    2

    Go to the API Key page.

    3

    Select the Copy icon next to your key.

Example

The example code below creates a text embedding using the default behavior for handling text that is too long. Ensure you replace the placeholders surrounded by <> with your values.

1from twelvelabs import TwelveLabs
2from typing import List
3from twelvelabs.models.embed import SegmentEmbedding
4
5def print_segments(segments: List[SegmentEmbedding], max_elements: int = 5):
6 for segment in segments:
7 print(
8 f" embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
9 )
10 print(f" embeddings: {segment.embeddings_float[:max_elements]}")
11
12client = TwelveLabs(api_key="<YOUR_API_KEY>")
13
14res = client.embed.create(
15 engine_name="Marengo-retrieval-2.6",
16 text="<YOUR_TEXT>",
17)
18
19print("Created a text embedding")
20print(f" Engine: {res.engine_name}")
21if res.text_embedding is not None and res.text_embedding.segments is not None:
22 print_segments(res.text_embedding.segments)

The output should look similar to the following one:

Created a text embedding
Engine: Marengo-retrieval-2.6
embedding_scope=None start_offset_sec=None end_offset_sec=None
embeddings: [, -, 0, ., 0, (truncated)]