Create audio embeddings
This guide shows how you can create audio embeddings.
The following table lists the available models for generating video embeddings and their key characteristics:
Model | Description | Dimensions | Max length | Similarity metric |
---|---|---|---|---|
Marengo-retrieval-2.7 | Use this model to create embeddings that you can use in various downstream tasks | 1024 | 10 seconds | Cosine similarity |
Note that the “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
The platform processes audio files up to 10 seconds in length. Files longer than 10 seconds are automatically truncated.
To create an embedding for an audio file, you must specify at least the following parameters:
model_name
: The name of the video understanding model you want to use. Example: "Marengo-retrieval-2.7".audio_url
oraudio_file
: The publicly accessible URL of your audio file or the path of your audio file. You must provide at least one of these parameters. If you specify both, theaudio_url
parameter takes precedence.
The response is an object containing the following fields:
model_name
: The name of the model the platform has used to create this embedding.audio_embedding
: An object that contains your embeddings and additional information.
For a description of each field in the request and response, see the Create embeddings for text, image, and audio page.
Prerequisites
- You’re familiar with the concepts that are described on the Platform overview page.
- You have an API key. To retrieve your API key, navigate to the API Key page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.
- The audio files you wish to use must meet the following requirements:
- Format: WAV (uncompressed), MP3 (lossy), and FLAC (lossless)
- File size: Must not exceed 10MB.
Example
The example code below creates an audio embedding. Ensure you replace the placeholders surrounded by <>
with your values:
from twelvelabs import TwelveLabs
from typing import List
from twelvelabs.models.embed import SegmentEmbedding
def print_segments(segments: List[SegmentEmbedding], max_elements: int = 5):
for segment in segments:
print(
f" embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
)
print(f" embeddings: {segment.embeddings_float[:max_elements]}")
client = TwelveLabs(api_key="<YOUR_API_KEY>")
res = client.embed.create(
model_name="Marengo-retrieval-2.7",
audio_file="<YOUR_AUDIO_FILE>"
)
print(f"Created audio embedding: model_name={res.model_name}")
if res.audio_embedding is not None and res.audio_embedding.segments is not None:
print_segments(res.audio_embedding.segments)
import { TwelveLabs, SegmentEmbedding } from "twelvelabs-js";
const printSegments = (segments: SegmentEmbedding[], maxElements = 5) => {
segments.forEach((segment) => {
console.log(
` embedding_scope=${segment.embeddingScope} start_offset_sec=${segment.startOffsetSec} end_offset_sec=${segment.endOffsetSec}`
);
console.log(
" embeddings: ",
segment.embeddingsFloat.slice(0, maxElements)
);
});
};
const client = new TwelveLabs({ apiKey:'<YOUR_API_KEY>' });
const res = await client.embed.create({
modelName: 'Marengo-retrieval-2.7',
audioFile: '<YOUR_AUDIO_FILE>',
});
console.log(`Created audio embedding: modelName=${res.modelName}`);
if (res.audioEmbedding?.segments) {
printSegments(res.audioEmbedding.segments);
}
Updated about 12 hours ago