The embedding object has the following fields:

  • model_name: A string that represents the name of the video understanding model used by the platform to create the embeddings.

  • One or more of the following embedding fields, depending on the parameters specified in the request:

    • text_embedding
    • audio_embedding
    • image_embedding
      Each of these fields is an object containing the following fields, among other information:
      • segments: An array of objects that contains the embeddings for each segment and associated information. Each of these objects contains, among other information, an array of floating point numbers named float representing an embedding. This array has 1024 dimensions, and you can use it with cosine similarity for various downstream tasks.

The “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.