This guide shows how you can create audio embeddings.

The following table lists the available models for generating audio embeddings and their key characteristics:

Model	Description	Dimensions	Max length	Similarity metric
Marengo 3.0	Enhanced model with sports intelligence and extended content support. For a list of the new features, see the New in Marengo 3.0 section.	512	10 seconds	Cosine similarity
Marengo 2.7	Video embedding model for multimodal search.	1024	10 seconds	Cosine similarity

Note that the Marengo video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

The platform processes audio files up to 10 seconds in length. Files longer than 10 seconds are automatically truncated.

Prerequisites

To use the platform, you need an API key:

1
If you don’t have an account, sign up for a free account.
2
Go to the API Keys page.
3
Select the Copy icon next to your key.

Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:

$ pip install twelvelabs

Your audio files must meet the format requirements.

Complete example

This complete example shows how you can create audio embeddings. Ensure you replace the placeholders surrounded by <> with your values.

1 from typing import List
2 
3 from twelvelabs import TwelveLabs
4 from twelvelabs.types import BaseSegment
5 
6 # 1. Initialize the client
7 client = TwelveLabs(api_key="<YOUR_API_KEY>")
8 
9 # 2. Create audio embeddings
10 res = client.embed.create(
11     model_name="marengo3.0",
12     audio_url="<YOUR_AUDIO_URL>",
13     # Or for a local file: audio_file=open("<PATH_TO_AUDIO_FILE>", "rb")
14     # audio_start_offset_sec=2
15 )
16 
17 # 3. Process the results
18 def print_segments(segments: List[BaseSegment], max_elements: int = 5):
19     for segment in segments:
20         first_few = segment.float_[:max_elements]
21         print(
22             f"  embeddings: [{', '.join(str(x) for x in first_few)}...] (total: {len(segment.float_)} values)"
23         )
24 
25 
26 print("Created audio embedding")
27 if res.audio_embedding is not None and res.audio_embedding.segments is not None:
28     print_segments(res.audio_embedding.segments)

Step-by-step guide

Python

Node.js

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

Create audio embeddings

Function call: You call the embed.create function.
Parameters:

model_name: The name of the model you want to use (example: “marengo3.0”).
audio_url or audio_file:
- audio_url: The publicly accessible URL of your audio file (string)
- audio_file: An opened file object in binary read mode. Use open(path, 'rb') to open your local file
(Optional) audio_start_offset_sec: The start time, in seconds, from which the platform generates the audio embeddings. This parameter allows you to skip the initial portion of the audio during processing.

Return value: The response contains the following fields:

audio_embedding: An object that contains the embedding data for your audio file. It includes the following fields:
- segments: An object that contains the following:
- float_: An array of floats representing the embedding
- start_offset_sec: The start time.
- metadata: An object that contains metadata about the embedding.
model_name: The name of the video understanding model the platform has used to create this embedding.

Process the results

This example prints the results to the standard output.