Dataset examples

A good fine-tuning dataset improves the performance of the model in specific domains. This section provides examples of effective datasets across various fields or industries. Each example includes the following elements:

  • Domain: The specific field or industry.
  • Dataset content: Types of data to include.
  • Annotation guidance: How to label or annotate the data.
  • Specific examples: Detailed instances within the domain.

Healthcare

Domain: Different surgeries and medical procedures.
Dataset content: Include diverse videos showing different surgeries and procedures.
Annotation guidance: Identify key steps, label medical tools, and define basic terms.
Specific examples:

  • Procedure: Cardiac catheterization.
    • Key steps: Patient preparation, catheter insertion, contrast dye injection, image acquisition, etc.
    • Annotation: Mark procedural phases, label catheter types, and identify anatomical landmarks.
  • Procedure: Annual health check-up.
    • Key steps: Check vital signs, examine body systems, discuss health history, etc.
    • Annotation: Mark examination steps, label medical tools, and identify patient-doctor interactions.

Sports

Domain: Various sporting events and activities
Dataset content: Include videos of different sports, focusing on specific actions, plays, and rules.
Annotation guidance: Label techniques, mark key events, and annotate rule applications.
Specific examples:

  • American football: Stiff arm.
    • Definition: Defensive maneuver to fend off a tackler.
      Annotation: Label the technique, ensuring spatial and temporal tightness.
  • Ice hockey: Slapshot
    • Definition: A powerful shot where the player swings their stick in a wide arc before striking the puck.
    • Annotation: Label the technique, ensuring spatial and temporal tightness.
  • Basketball: Pick-and-roll
    • Definition: A play involving a player setting a screen (pick) for a teammate and then moving toward the basket (roll).
    • Annotation: Label the technique, ensuring spatial and temporal tightness.

Media & Entertainment (M&E)

Domain: Television shows, movies, and live performances.
Dataset content: Include a variety of videos, such as TV episodes, movie scenes, music videos, and live concert recordings.
Annotation Guidance: Identify key scenes, label dialogue segments, and identify special effects.
Specific Examples:

  • Scenario: Golden Buzzer Moment from talent shows like "America's Got Talent"
    • Key Elements: Performance build-up, emotional peak, judges' and host's reactions, special effects.
    • Annotation: Label the exact moment the golden buzzer is pressed, the visual effects (confetti, lighting), and the ensuing reactions.
  • Scenario: Dramatic TV series scene "Big Reveal Moment"
    • Key elements: The moment when a major plot twist or hidden truth is revealed, drastically changing the narrative or characters' understanding.
    • Annotation: Label the suspenseful dialogue and visual cues leading up to the reveal, and mark the exact moment the critical information is disclosed