Use prompts to analyze video content and generate text outputs. Prompt engineering is the process of iteratively refining how you craft your instructions or questions to the model to improve the quality, relevance, and precision of the responses. Prompt engineering is important for enhancing the effectiveness of the model in various use cases, from content creation and summarization to question-answering, as shown below:
These tips apply to both Pegasus 1.2 and Pegasus 1.5 general analysis. Pegasus 1.5 also supports the prompt_v_2 parameter for structured prompts with reference images. For details, see the Python SDK Reference or Node.js SDK Reference.
The typical steps involved in prompt engineering are as follows:
Crafting the perfect prompt is not achieved through a universal solution, as the effectiveness of a specific method can vary widely depending on the task at hand. However, the tips provided in this section can help enhance your prompt-writing skills. By experimenting with them, you can discover approaches that lead to more accurate and relevant responses.
Examples guide the model in generating the expected output, reducing ambiguity, and ensuring the platform generates relevant responses. The following example creates a police report based on surveillance footage. It includes an example of a similar report to guide the model’s response.
Providing context in prompts helps the platform understand your requirements, ensuring the generated response is accurately tailored to your needs. By providing context, you reduce the chances of irrelevant outputs. The following example provides the required context to customize the generated response according to your needs.
Specificity guides the model in producing highly relevant and targeted responses by aligning the output with your intentions. The following example indicates the exact aspect of the video you want the model to focus on - creating a daily workout plan for this week based on the workout routine mentioned in this video. This helps the model understand the scope of the prompt and generate a targeted response.
Based on your requirements, differentiate between question-answering and description-based prompts, as each will guide the model’s focus differently. The example prompt below is phrased as a question and instructs the model to list the filming techniques used in a video.
Clearly state the desired output’s length, style, and format (examples: JSON format, email) to ensure the output meets your requirements. The example below summarizes a video as an email, focusing on the five most important points.
Specify if you want the output in a different language. The following example summarizes a video, indicating that the response should be in Spanish.
Being concise helps the model focus on the essential information. This speeds up processing and increases the likelihood of generating precise, relevant responses.
Tuning the temperature controls the randomness of the text output. A lower temperature results in more deterministic results, which is ideal for tasks requiring high accuracy and specificity. In contrast, a higher temperature produces more creative text, which is suitable for brainstorming or creative writing tasks. Experiment with this setting to find the optimal balance that meets your objectives. For details, see the Tune the temperature page.
Objective: Generate increasingly detailed video descriptions by iteratively refining your prompts from a general summary to timestamped scene breakdowns.
Objective: Locate specific moments in a video by progressively narrowing your search criteria from broad segments to exact timestamps.
Objective: Extract a complete, formatted recipe from a cooking video by iteratively adding detail and structure to your prompts.
Objective: Transform video content into a structured workout plan by extracting information, organizing it, and formatting it for a specific output format.