Create embeddings for text, image, and audio

This method creates embedings for text, image, and audio content.

Before you create an embedding, ensure that the following prerequisites are met:

Parameters for embeddings:

  • Common parameters:
    • engine_name: The video understanding engine you want to use. Example: "Marengo-retrieval-2.6".
  • Text embeddings:
    • text: Text for which to create an embedding.
  • Image embeddings:
    Provide one of the following:
    • image_url: Publicly accessible URL of your image file.
    • image_file: Local image file.
  • Audio embeddings:
    Provide one of the following:
    • audio_url: Publicly accessible URL of your audio file.
    • audio_file: Local audio file.

NOTES:

  • The “Marengo-retrieval-2.6” video understanding engine generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
  • You can create multiple types of embeddings in a single API call.
  • Audio embeddings combine generic sound and human speech in a single embedding. For videos with transcriptions, you can retrieve transcriptions and then create text embeddings from these transcriptions.

🚧

Important

The response includes breaking changes that might require updates to your application code.
Common changes:

  • The is_success boolean flag has been removed.

Media-specific changes:

  • Text and audio: The embedding vectors are now nested under an array named segments.
Language
Click Try It! to start a request and see the response here!