This guide shows how you can create audio embeddings.

The following table lists the available models for generating audio embeddings and their key characteristics:

Model	Description	Dimensions	Max length	Similarity metric
Marengo-retrieval-2.7	Use this model to create embeddings that you can use in various downstream tasks	1024	10 seconds	Cosine similarity

Note that the Marengo video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

The platform processes audio files up to 10 seconds in length. Files longer than 10 seconds are automatically truncated.

Prerequisites

To use the platform, you need an API key:

1
If you don’t have an account, sign up for a free account.
2
Go to the API Keys page.
3
Select the Copy icon next to your key.

Ensure the TwelveLabs SDK is installed on your computer:

$ pip install twelvelabs

The audio files you wish to use must meet the following requirements:
- Format: WAV (uncompressed), MP3 (lossy), and FLAC (lossless)
- File size: Must not exceed 10MB.

Complete example

This complete example shows how you can create audio embeddings. Ensure you replace the placeholders surrounded by <> with your values.

1 from typing import List
2 
3 from twelvelabs import TwelveLabs
4 from twelvelabs.types import BaseSegment
5 
6 # 1. Initialize the client
7 client = TwelveLabs(api_key="<YOUR_API_KEY>")
8 
9 # 2. Create audio embeddings
10 res = client.embed.create(
11     model_name="Marengo-retrieval-2.7",
12     audio_url="<YOUR_AUDIO_URL>",
13     # audio_start_offset_sec=2
14 )
15 
16 # 3. Process the results
17 def print_segments(segments: List[BaseSegment], max_elements: int = 5):
18     for segment in segments:
19         first_few = segment.float_[:max_elements]
20         print(
21             f"  embeddings: [{', '.join(str(x) for x in first_few)}...] (total: {len(segment.float_)} values)"
22         )
23 
24 
25 print("Created audio embedding")
26 if res.audio_embedding is not None and res.audio_embedding.segments is not None:
27     print_segments(res.audio_embedding.segments)

Step-by-step guide

Python

Node.js

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

Create audio embeddings

Function call: You call the embed.create function.
Parameters:

model_name: The name of the model you want to use (“Marengo-retrieval-2.7”).
audio_url or audio_file: The publicly accessible URL or the path of your audio file.
(Optional) audio_start_offset_sec: The start time, in seconds, from which the platform generates the audio embeddings. This parameter allows you to skip the initial portion of the audio during processing.

Return value: The response contains the following fields:

audio_embedding: An object that contains the embedding data for your audio file. It includes the following fields:
- segments: An object that contains the following:
- float_: An array of floats representing the embedding
- start_offset_sec: The start time.
- metadata: An object that contains metadata about the embedding.
model_name: The name of the video understanding model the platform has used to create this embedding.

Process the results

This example prints the results to the standard output.