Create audio embeddings

This guide shows how you can create audio embeddings.

The following table lists the available models for generating video embeddings and their key characteristics:

ModelDescriptionDimensionsMax lengthSimilarity metric
Marengo-retrieval-2.6Use this model to create embeddings that you can use in various downstream tasks102410 secondsCosine similarity

Note that the “Marengo-retrieval-2.6” video understanding engine generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

To create an embedding for an audio file, you must specify at least the following parameters:

  • engine_name: The name of the video understanding engine you want to use. Example: "Marengo-retrieval-2.6".
  • audio_url or audio_file: The publicly accessible URL of your audio file or the path of your audio file. You must provide at least one of these parameters. If you specify both, the audio_url parameter takes precedence.

The response is an object containing the following fields:

  • engine_name: The name of the engine the platform has used to create this embedding.
  • audio_embedding: An object that contains your embeddings and additional information.

For a description of each field in the request and response, see the Create embeddings for text, image, and audio page.

Prerequisites

  • You’re familiar with the concepts that are described on the Platform overview page.
  • You have an API key. To retrieve your API key, navigate to the API Key page and log in with your credentials. Then, select the Copy icon to the right of your API key to copy it to your clipboard.
  • The audio files you wish to use must meet the following requirements:
    • Format: WAV (uncompressed), MP3 (lossy), and FLAC (lossless)
    • Duration: Must not exceed 10 seconds.

Example

The example code below creates an audio embedding using the default behavior for handling audio files that are too long. Ensure you replace the placeholders surrounded by <> with your values:

from twelvelabs import TwelveLabs
from typing import List
from twelvelabs.models.embed import SegmentEmbedding

def print_segments(segments: List[SegmentEmbedding], max_elements: int = 5):
    for segment in segments:
        print(
            f"  embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
        )
        print(f"  embeddings: {segment.embeddings_float[:max_elements]}")
        
client = TwelveLabs(api_key="<YOUR_API_KEY>")

res = client.embed.create(
    engine_name="Marengo-retrieval-2.6",
    audio_file="<YOUR_AUDIO_FILE>"
)
print(f"Created audio embedding: engine_name={res.engine_name}")
if res.audio_embedding is not None and res.audio_embedding.segments is not None:
    print_segments(res.audio_embedding.segments)
import { TwelveLabs, SegmentEmbedding } from "twelvelabs-js";

const printSegments = (segments: SegmentEmbedding[], maxElements = 5) => {
  segments.forEach((segment) => {
    console.log(
      `  embedding_scope=${segment.embeddingScope} start_offset_sec=${segment.startOffsetSec} end_offset_sec=${segment.endOffsetSec}`
    );
    console.log(
      "  embeddings: ",
      segment.embeddingsFloat.slice(0, maxElements)
    );
  });
};

const client = new TwelveLabs({ apiKey:'<YOUR_API_KEY>' });

const res = await client.embed.create({
  engineName: 'Marengo-retrieval-2.6',
  audioFile: '<YOUR_AUDIO_FILE>',
});

console.log(`Created audio embedding: engineName=${res.engineName}`);

if (res.audioEmbedding?.segments) {
  printSegments(res.audioEmbedding.segments);
}