Data preparation

Fine-tuning a base model requires both a training dataset and a validation dataset. The training dataset is used to teach the model the desired concepts and actions, while the validation dataset is used to assess the model's performance and generalization ability on unseen data.

Typically, the amount of data required for fine-tuning a base model depends on the following factors:

  • Data quality: Higher-quality data with tighter and noise-free annotations may require fewer samples for effective fine-tuning.
  • Task complexity: Complex concepts or actions may require more data to capture the full range of variations.

Twelve Labs recommends you provide at least ten samples for the training dataset, with an 80:20 split between the training and validation sets. The validation set should test the decision boundary well, containing diverse positive and hard negative videos and matching the distribution of practical usage.

To ensure successful fine-tuning, the following data requirements must be met:

  • The training data consists of raw videos rather than clipped or edited videos as input data.
  • The training data must be in a CSV file. Each line in the CSV file represents a single annotation, with the following fields separated by commas:
    • <video_url>: The publicly accessible URL of the raw video. Note that YouTube URLs are not supported.
    • <start_time>: The start time of the relevant segment, expressed in seconds from the beginning of the video.
    • <end_time>: The end time of the relevant segment, expressed in seconds from the beginning of the video..
    • <label_or_description>: The label or description of the concept or action occurring in the specified segment.