Text embeddings

This guide shows how you can create text embeddings.

The following table lists the available models for generating text embeddings and their key characteristics:

ModelDescriptionDimensionsMax tokensSimilarity metric
Marengo-retrieval-2.7Use this model to create embeddings that you can use in various downstream tasks102477Cosine similarity

The Marengo video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

Prerequisites

  • To use the platform, you need an API key:

    1

    If you don’t have an account, sign up for a free account.

    2

    Go to the API Key page.

    3

    Select the Copy icon next to your key.

  • Ensure the TwelveLabs SDK is installed on your computer:

    $pip install twelvelabs

Complete example

This complete example shows how you can create text embeddings. Ensure you replace the placeholders surrounded by <> with your values.

1from typing import List
2
3from twelvelabs import TwelveLabs
4from twelvelabs.types import BaseSegment
5
6# 1. Initialize the client
7client = TwelveLabs(api_key="<YOUR_API_KEY>")
8
9# 2. Create text embeddings
10res = client.embed.create(
11 model_name="Marengo-retrieval-2.7",
12 text="<YOUR_TEXT>",
13 # text_truncate="start"
14)
15
16# 3. Process the results
17def print_segments(segments: List[BaseSegment], max_elements: int = 5):
18 for segment in segments:
19 first_few = segment.float_[:max_elements]
20 print(
21 f" embeddings: [{', '.join(str(x) for x in first_few)}...] (total: {len(segment.float_)} values)"
22 )
23
24print("Created text embedding")
25if res.text_embedding is not None and res.text_embedding.segments is not None:
26 print_segments(res.text_embedding.segments)

Step-by-step guide

1

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

  • api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

2

Create text embeddings

Function call: You call the embed.create function.
Parameters:

  • model_name: The name of the model you want to use (“Marengo-retrieval-2.7”).
  • text: The text for which you wish to create an embedding.
  • (Optional) text_truncate: A string that specifies how the platform truncates text that exceeds 77 tokens to fit the maximum length allowed for an embedding. This parameter can take one of the following values:
    • start: The platform will truncate the start of the provided text.
    • end: The platform will truncate the end of the provided text. This is the default value.
    • none: The platform will return an error if the text is longer than the maximum token limit.

Return value: The response contains the following fields:

  • text_embedding: An object that contains the embedding data for your text. It includes the following fields:
    • segments: An object that contains the following:
      • float_: An array of floats representing the embedding
    • metadata: An object that contains metadata about the embedding.
  • model_name: The name of the video understanding model the platform has used to create this embedding.
3

Process the results

This example prints the results to the standard output.