Create embeddings for text, image, and audio

This method creates embeddings for text, image, and audio content.

Before you create an embedding, ensure that the following prerequisites are met:

Parameters for embeddings:

  • Common parameters:
    • model_name: The video understanding model you want to use. Example: "Marengo-retrieval-2.7".
  • Text embeddings:
    • text: Text for which to create an embedding.
  • Image embeddings:
    Provide one of the following:
    • image_url: Publicly accessible URL of your image file.
    • image_file: Local image file.
  • Audio embeddings:
    Provide one of the following:
    • audio_url: Publicly accessible URL of your audio file.
    • audio_file: Local audio file.

NOTES:

  • The "Marengo-retrieval-2.7" video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.
  • You can create multiple types of embeddings in a single API call.
  • Audio embeddings combine generic sound and human speech in a single embedding. For videos with transcriptions, you can retrieve transcriptions and then create text embeddings from these transcriptions.
Language
Click Try It! to start a request and see the response here!