Pegasus is a generative model for video-to-text generation. The current version is Pegasus 1.2.

Pegasus analyzes multiple modalities to generate contextually relevant text based on the content of your videos.

Key features

Video-to-text generation: Creates detailed textual descriptions based on video content
Extended processing capacity: Processes videos up to 1 hour in length
Granular visual comprehension: Analyzes objects, on-screen text, and numerical content
Temporal grounding: Accurately identifies timestamps of specific events
Multimodal understanding: Combines visual, audio, and textual information for comprehensive analysis

Use cases

Content summarization: Generate concise summaries of video content
Detailed descriptions: Create comprehensive textual descriptions of visual scenes
Timestamp identification: Answer questions about when specific events occur in videos
Content analysis: Extract key information from video content for further processing

Input requirements

The specifications on this page reflect the maximum capabilities of the model. Your actual requirements depend on the upload method and operation you choose. For details about the available upload methods and the corresponding limits, see the Upload and processing methods page.

Video file requirements

Version	Duration	File Size	Resolution	Aspect Ratio	Formats
Pegasus 1.2	4 sec - 1 hour (2 hours in a future release)	≤ 2 GB	360x360 - 5184x2160	Between 1:1 and 1:2.4, or between 2.4:1 and 1:1. For example, you can use 1:1, 4:3, 4:5, 5:4, 16:9, 9:16, or 17:9.	FFmpeg supported

Audio and video stream durations must not differ by more than 0.5 seconds.

For videos in other formats or if you require different options, contact us at support@twelvelabs.io.

Note

If you upload files using publicly accessible URLs, use direct links to raw video files that play without user interaction or custom video players (example: https://example.com/videos/sample-video.mp4). Video hosting platforms like YouTube and cloud storage sharing links are not supported.

Supported languages

Pegasus supports the following languages for processing visual and audio content, understanding prompts, and generating outputs:

Full support: English
Partial support: Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai, Vietnamese

Examples

Summarizing educational videos

This example prompt summarizes an educational video without any customization.

Summarize this video

This example prompt generates a caption for a social media post.

Generate an attention-grabbing caption for a social media post. Keep it shorter than 200 characters.

This example prompt creates a table of contents detailing the main sections.

Provide a table of contents detailing the main sections of this video.

Company-wide memo

This example prompt generates a company-wide memo.

Generate a company-wide memo based on the content of this video.

Video annotations

This example prompt identifies and lists key visual elements, scene changes, and notable events, briefly describing each.

Identify and list key visual elements, scene changes, and notable events in the video, briefly describing each.

Video question answering

This example prompt identifies the key takeaways.

What are the key takeaways of this video?

Timestamp breakdown

This example prompt lists all timestamps in an advertisement where a close-up of the product appears.

Tell me all the timestamps in the advertisement where a closeup of the product appears.

Police report

This example prompt creates a police report using a specific template for a video showing a robbery.

Create a police report based on what happened in the video. Provide the exact time range where the suspect appears in the video

Using different languages

Spanish

This example prompt summarizes a video and indicates that the response should be in Spanish. Note that the prompt is in English, and the output is in Spanish.

Write a summary in Spanish.

French

This example prompt summarizes the three main takeaways of a video. Note that the prompt and the output are in French.

Résumez les trois principaux points à retenir de cette vidéo

Support

For support or feedback regarding Pegasus, contact support@twelvelabs.io.