Open-ended text

Use the /generate endpoint for generating open-ended texts from videos that are more customizable and tailor-made than the results provided by the /summarize endpoint. This endpoint can generate diverse results based on your prompts, including, but not limited to, tables of content, action items, memos, reports, and comprehensive analyses.

Below are some examples of prompts tailored to generate specific content types:

Content typePrompt example
Table of contentsProvide a table of contents detailing the main sections of this video.
Action itemsIdentify and list all the action items assigned to each team member.
MemoGenerate a company-wide memo based on the announcements made in the video.
Police reportWrite a police report based on this video using the following example:

Date: 11/01/2020
Location: San Francisco Police Department
Witnesser’s full name: John Smith
Reporter: Barbara Lim
On 11/01/2020 around 5 PM, I saw a suspect walking in a retail store on Height Street…
Meeting minutesGenerate detailed meeting minutes from this video, including discussion points, decisions made, and follow-up actions assigned.
Video annotationsIdentify and list key visual elements, scene changes, and notable events in the video, briefly describing each.
Video question answering- What are the key takeaways of this video?

- What is the creative approach of this video?

For a description of each field in the request and response, see the API Reference > Generate open-ended textspage.

Prerequisites

The examples in this guide assume the following:

  • You’re familiar with the concepts that are described on the Platform overview page.
  • You’ve already created an index and the Pegasus video understanding model is enabled for this index.
  • You've uploaded a video, and the platform has finished indexing it.

Examples

When generating open-ended texts, the default behavior of the platform is to stream responses. This enables real-time processing of partial results, enhances the user experience with immediate feedback and significantly reduces the perceived latency.

For a description of each field in the request and response, see the API Reference > Open-ended texts page.

You can choose whether the model generates streaming responses or non-streaming responses. For details, see one of the sections below:

Streaming responses

For streaming responses, you must invoke the text_stream method of the generate object. The response consists of a stream of JSON objects, each on its own line, following the NDJSON format. Each object represents an event in the generation process, with three event types:

  • stream_start: Indicates the beginning of the stream. When you receive this event, initialize your processing logic.
    Example:
    {
      "event_type": "stream_start",
      "metadata": {
        "generation_id": "2f6d0bdd-aed8-47b1-8124-3c9d8006cdc9"
      }
    }
    
  • text_generation: Contains a fragment of generated text. As text_generation events arrive, handle the text fragments based on your application's needs. This might involve displaying the text in real-time, analyzing it, or storing it for later use. Note that these fragments may be of varying lengths and are not guaranteed to align with word or sentence boundaries.
    Example:
    {
      "event_type": "text_generation",
      "text": "Dive into the delightful world"
    }
    
  • stream_end: Indicates the end of the stream. When you receive this event, finalize your processing logic.
    Example:
    {
      "event_type": "stream_end",
      "metadata": {
        "generation_id": "2f6d0bdd-aed8-47b1-8124-3c9d8006cdc9"
      }
    }
    

To use streaming responses in your application:

  1. Start a stream by invoking the textStream method of the generate object with the following parameters:
    • video_id: A string representing the unique identifier of your video
    • prompt: A string that guides the model on the desired format or content.
  2. Use a loop to iterate over the stream.
  3. Inside the loop, handle each text fragment as it arrives. This example prints each fragment to the standard output.
  4. (Optional) After the stream ends, use the textStream.aggregatedText field if you need the full generated text.

The example code below demonstrates using the SDKs to generate and process a streaming response. It starts a stream for a specified video and prompt, prints each text fragment as it arrives, and prints the complete aggregated text. Ensure you replace the placeholders surrounded by <> with your values.

from twelvelabs import TwelveLabs

client = TwelveLabs(api_key="<YOUR_API_KEY>")

text_stream = client.generate.text_stream(
    video_id="<YOUR_VIDEO_ID>",
    prompt="<YOUR_PROMPT>"
)

for text in text_stream:
    print(text)

print(f"Aggregated text: {text_stream.aggregated_text}")
import { TwelveLabs } from 'twelvelabs-js';

const client = new TwelveLabs({ apiKey: '<YOUR_API_KEY>'});

const textStream = await client.generate.textStream({
  '<YOUR_VIDEO_ID>',
  '<YOUR_PROMPT>',
});

for await (const text of textStream) {
  console.log(text);
}

console.log(`Aggregated text: ${textStream.aggregatedText}`);

The output should look similar to the following:

This
 video charmingly captures the
 whims
ical and playful nature of
 cats engaging
 in a variety of activities
,
 from frolicking and
 exploring
 to moments of relaxation and
 quirky
 interactions with their environment.
 It highlights their
 curious behaviors and the
 joy they bring to everyday
 scenes.
Aggregated text: This video charmingly captures the whimsical and playful nature of cats engaging in a variety of activities, from frolicking and exploring to moments of relaxation and quirky interactions with their environment. It highlights their curious behaviors and the joy they bring to everyday scenes.

Non-streaming responses

For streaming responses, you must invoke the text method of the generate object. The following example generates a brief summary with a specific format by invoking the text method of the generate object with the following parameters:

  • video_id: A string representing the unique identifier of the video for which you want to generate a title.
  • prompt: A string that guides the model on the desired format or content.
from twelvelabs import TwelveLabs

client = TwelveLabs(api_key="<YOUR_API_KEY>")

res = client.generate.text(
  video_id="<YOUR_VIDEO_ID>",
  prompt="I want to generate a description for my video with the following format: Title of the video, followed by a summary in 2-3 sentences, highlighting the main topics."
)
print(f"{res.data}")
import { TwelveLabs } from 'twelvelabs-js';

const client = new TwelveLabs({ apiKey: '<YOUR_API_KEY>'});

const text = await client.generate.text(
  '<YOUR_VIDEO_ID>',
  'I want to generate a description for my video with the following format: Title of the video, followed by a summary in 2-3 sentences, highlighting the main topics.',
);
console.log(`${text.data}`);

The output should be similar to the following one:

Title: A Summer Day in Minnesota: College Graduation, Sun, Shopping, and Pennyboarding
Summary: In this video, a woman shares her summer day in Minnesota after her college graduation. She vlogs about her temporary move back home, showing her childhood home and expressing her love for getting some sun. The video captures various activities, including applying sunscreen, discovering a foul smell in her car, a shopping haul from favorite stores, the preparation of a bread salad, and meeting up with a friend to go pennyboarding at a parking garage. It's a fun and eventful day filled with sunshine, shopping, and outdoor adventures.

The following example generates a police report based on the provided template:

from twelvelabs import TwelveLabs

client = TwelveLabs(api_key="<YOUR_API_KEY>")

res = client.generate.text(
  video_id="<YOUR_VIDEO_ID>",
  prompt="Write a police report based on this video with the following example:\nDate: \n11/01/2020\nLocation: San Francisco Police Department\nWitnesser’s full name: John Smith\nReporter: Barbara Lim\n\nOn 11/01/2020 around 5 PM, I saw a suspect walking in a retail store on Height Street…"
)
print(f"{res.data}")
import { TwelveLabs } from 'twelvelabs-js';

const client = new TwelveLabs({ apiKey: '<YOUR_API_KEY>'});

const text = await client.generate.text(
  '<YOUR_VIDEO_ID>',
  'Write a police report based on this video with the following example:\nDate: \n11/01/2020\nLocation: San Francisco Police Department\nWitnesser’s full name: John Smith\nReporter: Barbara Lim\n\nOn 11/01/2020 around 5 PM, I saw a suspect walking in a retail store on Height Street…',
);
console.log(`${text.data}`);

The output should be similar to the following one:

Date: 11/01/2020
Location: San Francisco Police Department
Witness's full name: John Smith
Reporter: Barbara Lim

On 11/01/2020 around 5 PM, I, John Smith, witnessed a suspect walking in a retail store on 
Height Street. The suspect was observed stealing items from the store, including an item 
directly from the cash register. Two other individuals were also seen engaging in theft within 
the store.

The video evidence obtained from the store's surveillance cameras clearly captures the suspect's 
actions. The suspect was seen walking through the store and discreetly taking items without being 
noticed by anyone. Additionally, the video shows two other individuals stealing multiple items 
from the store before leaving.

One particular moment in the video shows a woman entering the camera's view, picking up a bottle 
of alcohol from the shelf, and putting it inside her bag. This incident adds to the evidence of 
theft within the store.

Based on the video footage and witness testimony, it is evident that multiple instances of theft 
occurred within the retail store on Height Street. The stolen items include those taken directly 
from the cash register, as well as various other items throughout the store.

We request further investigation into this matter to identify and apprehend the suspects involved 
in these thefts. The video evidence should be analyzed thoroughly to assist in the identification 
and prosecution of the individuals responsible.


Witnesser: John Smith
Reporter: Barbara Lim

The following example displays the key takeaways of a video:

from twelvelabs import TwelveLabs

client = TwelveLabs(api_key="<YOUR_API_KEY>")

res = client.generate.text(
  video_id="<YOUR_VIDEO_ID>",
  prompt="What are the key takeaways of this video?"
)
print(f"{res.data}")
import { TwelveLabs } from 'twelvelabs-js';

const client = new TwelveLabs({ apiKey: '<YOUR_API_KEY>'});

const text = await client.generate.text(
  '<YOUR_VIDEO_ID>',
  'What are the key takeaways of this video?',
);
console.log(`${text.data}`);

The output should be similar to the following one:

The key takeaways from the video are as follows:
Good posture is crucial for maintaining physical and mental health.
Poor posture can lead to discomfort, impaired body mechanics, and musculoskeletal issues.
Maintaining proper postural alignment is essential for overall physical well-being.
Posture can affect emotional state, sensitivity to pain, and overall balance.
Prolonged awkward positions and looking downwards while using electronic devices can lead to 
musculoskeletal problems.
Proper spinal alignment and understanding the structure of the spine are important for preventing issues.
Babies develop more curves in their spine as their muscles strengthen, enabling them to stay upright.
Posture plays a role in reducing stress and maintaining alignment.
Good postural alignment is essential while sitting, especially for those working at a computer.
Tips for maintaining good posture include using ergonomic aids, wearing suitable footwear, and 
keeping muscles and joints active.
Regular exercise and using muscles effectively support proper posture.
Consulting a physical therapist can provide guidance on proper postural alignment.

The following example identifies the creative approach of a video:

from twelvelabs import TwelveLabs

client = TwelveLabs(api_key="<YOUR_API_KEY>")

res = client.generate.text(
  video_id="<YOUR_VIDEO_ID>",
  prompt="What is the creative approach of the video?"
)
print(f"{res.data}")
import { TwelveLabs } from 'twelvelabs-js';

const client = new TwelveLabs({ apiKey: '<YOUR_API_KEY>'});

const text = await client.generate.text(
  '<YOUR_VIDEO_ID>',
  'What is the creative approach of this video?',
);
console.log(`${text.data}`);

The output should be similar to the following one:

The creative approach of the video is to showcase a "Joyful Journey" theme by featuring a man 
exploring different locations and opening doors to reveal various settings. The video transitions 
between scenes of a jungle, an industrial area, and a hillside with water falling from above. 
It also includes animated scenes and people enjoying Coca-Cola drinks together. The advertisement 
aims to convey a sense of joy and togetherness associated with drinking Coca-Cola.