Audio embeddings

This guide shows how you can create audio embeddings.

The following table lists the available models for generating audio embeddings and their key characteristics:

ModelDescriptionDimensionsMax lengthSimilarity metric
Marengo-retrieval-2.7Use this model to create embeddings that you can use in various downstream tasks102410 secondsCosine similarity

Note that the Marengo video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

The platform processes audio files up to 10 seconds in length. Files longer than 10 seconds are automatically truncated.

Prerequisites

  • To use the platform, you need an API key:

    1

    If you don’t have an account, sign up for a free account.

    2

    Go to the API Key page.

    3

    Select the Copy icon next to your key.

  • Ensure the TwelveLabs SDK is installed on your computer:

    $pip install twelvelabs
  • The audio files you wish to use must meet the following requirements:

    • Format: WAV (uncompressed), MP3 (lossy), and FLAC (lossless)
    • File size: Must not exceed 10MB.

Complete example

This complete example shows how you can create audio embeddings. Ensure you replace the placeholders surrounded by <> with your values.

1from typing import List
2
3from twelvelabs import TwelveLabs
4from twelvelabs.types import BaseSegment
5
6# 1. Initialize the client
7client = TwelveLabs(api_key="<YOUR_API_KEY>")
8
9# 2. Create audio embeddings
10res = client.embed.create(
11 model_name="Marengo-retrieval-2.7",
12 audio_url="<YOUR_AUDIO_URL>",
13 # audio_start_offset_sec=2
14)
15
16# 3. Process the results
17def print_segments(segments: List[BaseSegment], max_elements: int = 5):
18 for segment in segments:
19 first_few = segment.float_[:max_elements]
20 print(
21 f" embeddings: [{', '.join(str(x) for x in first_few)}...] (total: {len(segment.float_)} values)"
22 )
23
24
25print("Created audio embedding")
26if res.audio_embedding is not None and res.audio_embedding.segments is not None:
27 print_segments(res.audio_embedding.segments)

Step-by-step guide

1

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

  • api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

2

Create audio embeddings

Function call: You call the embed.create function.
Parameters:

  • model_name: The name of the model you want to use (“Marengo-retrieval-2.7”).
  • audio_url or audio_file: The publicly accessible URL or the path of your audio file.
  • (Optional) audio_start_offset_sec: The start time, in seconds, from which the platform generates the audio embeddings. This parameter allows you to skip the initial portion of the audio during processing.

Return value: The response contains the following fields:

  • audio_embedding: An object that contains the embedding data for your audio file. It includes the following fields:
    • segments: An object that contains the following:
    • float_: An array of floats representing the embedding
    • start_offset_sec: The start time.
    • metadata: An object that contains metadata about the embedding.
  • model_name: The name of the video understanding model the platform has used to create this embedding.
3

Process the results

This example prints the results to the standard output.