Examples

This page shows examples of using the Mengo and Pegasus video understanding models. Note that the screenshots in the sections below are from the Playground. However, the principles demonstrated are similar when invoking the API programmatically.

Marengo

This section contains examples of using the Marengo video understanding model.

Steve Jobs introducing the iPhone

In the example screenshot below, the query was “How did Steve Jobs introduce the iPhone?”. The Marengo video understanding model used information found in the visual and conversation modalities to perform the following tasks:

Visual recognition of a famous person (Steve Jobs)
Joint speech and visual recognition to semantically search for the moment when Steve Jobs introduced the iPhone. Note that semantic search finds information based on the intended meaning of the query rather than the literal words you used, meaning that the platform identified the matching video fragments even if Steve Jobs didn’t explicitly say the words in the query.

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

Polar bear holding a Coca-Cola bottle

In the example screenshot below, the query was “Polar bear holding a Coca-Cola bottle.” The Marengo video understanding model used information found in the visual and logo modalities to perform the following tasks:

Recognition of a cartoon character (polar bear)
Identification of an object (bottle)
Detection of a specific brand logo (Coca-Cola)
Identification of an action (polar bear holding a bottle)

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

Using different languages

This section provides examples of using different languages to perform search requests.

Spanish

In the example screenshot below, the query was “¿Cómo presentó Steve Jobs el iPhone?” (“How did Steve Jobs introduce the iPhone?”). The Marengo video understanding model used information from the visual and audio modalities.

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

Chinese

In the example screenshot below, the query was “猫做有趣的事情” (“Cats doing funny things.”). The Marengo video understanding model used information from the visual modality.

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

Pegasus

This section contains examples of using the Pegasus video understanding model.

Summarizing educational videos

In the example screenshot below, the platform has summarized an educational video using predefined templates without any customization:

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

In the example screenshot below, the prompt instructs the platform to generate a caption for a social media post:

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

Writing police reports

In the example screenshot below, the prompt instructs the platform to write a police report using a specific template for a video showing a robbery:

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

Using different languages

This sections provides example of using different languages to generate text from videos.

Spanish

The following example summarizes a video, indicating that the response should be in Spanish. Note that the prompt is in English, and the output is in Spanish.

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.

French

The following example summarizes the main three takeaways from this video. Note that the prompt and the output are in French.

To see this example in the Playground, ensure you’re logged in, and then open this URL in your browser.