Audio embeddings
This guide shows how you can create audio embeddings.
The following table lists the available models for generating audio embeddings and their key characteristics:
Note that the “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
The platform processes audio files up to 10 seconds in length. Files longer than 10 seconds are automatically truncated.
Prerequisites
-
To use the platform, you need an API key:
-
Ensure the TwelveLabs SDK is installed on your computer:
-
The audio files you wish to use must meet the following requirements:
- Format: WAV (uncompressed), MP3 (lossy), and FLAC (lossless)
- File size: Must not exceed 10MB.
Complete example
This complete example shows how you can create audio emebddings. Ensure you replace the placeholders surrounded by <>
with your values.
Step-by-step guide
Python
Node.js
Import the SDK and initialize the client
Create a client instance to interact with the TwelveLabs Video Understanding platform.
Function call: You call the constructor of the TwelveLabs
class.
Parameters:
api_key
: The API key to authenticate your requests to the platform.
Return value: An object of type TwelveLabs
configured for making API calls.
Create audio embeddings
Function call: You call the embed.create
function.
Parameters:
model_name
: The name of the model you want to use (“Marengo-retrieval-2.7”).audio_file
oraudio_url
: The path or the publicly accessible URL of your audio file.- (Optional)
audio_start_offset_sec
: The start time, in seconds, from which the platform generates the audio embeddings. This parameter allows you to skip the initial portion of the audio during processing.
Return value: The response contains the following fields:
audio_embedding
: An object that contains the embedding data for your audio file. It includes the following fields:segments
: An object that contains the following:float
: An array of floats representing the embeddingstart_offset_sec
: The start time.
metadata
: An object that contains metadata about the embedding.
model_name
: The name ofhe video understanding model the platform has used to create this embedding.