Create text embeddings
This guide shows how you can create text embeddings.
The following table lists the available models for generating text embeddings and their key characteristics:
Model | Description | Dimensions | Max tokens | Similarity metric |
---|---|---|---|---|
Marengo-retrieval-2.7 | Use this model to create embeddings that you can use in various downstream tasks | 1024 | 77 | Cosine similarity |
The “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
To create text embeddings, invoke the create
method of the embed
class specifying the following parameters:
model_name
: The name of the video understanding model you want to use.text
: The text for which you want to create an embedding.- (Optional)
text_truncate
: Specifies the behavior for text that exceeds 77 tokens. It can take one of the following values:start
: Truncate the beginning of the text.end
: Truncate the end of the text (default).none
: Return an error if the text exceeds the token limit.
The response is an object containing the following fields:
model_name
: The name of the model the platform has used to create this embedding.text_embedding
: An object that contains the embedding.
For a description of each field in the request and response, see the Create embeddings for text, image, and audio page.
Prerequisites
- You’re familiar with the concepts that are described on the Platform overview page.
- You have an API key. To retrieve your API key, navigate to the API Key page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.
Example
The example code below creates a text embedding using the default behavior for handling text that is too long. Ensure you replace the placeholders surrounded by <>
with your values.
from twelvelabs import TwelveLabs
from typing import List
from twelvelabs.models.embed import SegmentEmbedding
def print_segments(segments: List[SegmentEmbedding], max_elements: int = 5):
for segment in segments:
print(
f" embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
)
print(f" embeddings: {segment.embeddings_float[:max_elements]}")
client = TwelveLabs(api_key="<YOUR_API_KEY>")
res = client.embed.create(
model_name="Marengo-retrieval-2.7",
text="<YOUR_TEXT>",
)
print("Created a text embedding")
print(f" Model: {res.model_name}")
if res.text_embedding is not None and res.text_embedding.segments is not None:
print_segments(res.text_embedding.segments)
import { TwelveLabs, SegmentEmbedding } from "twelvelabs-js";
const printSegments = (segments: SegmentEmbedding[], maxElements = 5) => {
segments.forEach((segment) => {
console.log(
` embedding_scope=${segment.embeddingScope} start_offset_sec=${segment.startOffsetSec} end_offset_sec=${segment.endOffsetSec}`
);
console.log(
" embeddings: ",
segment.embeddingsFloat.slice(0, maxElements)
);
});
};
const client = new TwelveLabs({ apiKey: "<YOUR_API_KEY>" });
let res = await client.embed.create({
modelName: "Marengo-retrieval-2.7",
text: "<YOUR_TEXT>",
});
console.log(`Created text embedding: modelName=${res.modelName}`);
if (res.textEmbedding?.segments) {
printSegments(res.textEmbedding.segments);
}
The output should look similar to the following one:
Created a text embedding
Model: Marengo-retrieval-2.7
embedding_scope=None start_offset_sec=None end_offset_sec=None
embeddings: [, -, 0, ., 0, (truncated)]
Updated about 1 month ago