Databricks - Advanced video understanding

Summary: This integration combines Twelve Labs' Embed API with Databricks Mosaic AI Vector Search to create advanced video understanding applications. It transforms video content into multimodal embeddings that capture the relationships between visual expressions, body language, spoken words, and overall context, enabling powerful similarity search and recommendation systems.

Description: Integrating Twelve Labs with Databricks Mosaic AI addresses key challenges in video AI, such as efficient processing of large-scale video datasets and accurate multimodal content representation. The process involves the following main steps:

  1. Generate multimodal embeddings from video content using Twelve Labs' Embed API
  2. Store these embeddings along with video metadata in a Delta Table
  3. Configure Mosaic AI Vector Search with a Delta Sync Index to access the embeddings
  4. Generate text embeddings for search queries
  5. Perform similarity searches between text queries and video content
  6. Build a video recommendation system that suggests videos similar to a given video based on embedding similarities

Step-by-step guide: Our blog post, Mastering Multimodal AI: Advanced Video Understanding with Twelve Labs + Databricks Mosaic AI, guides you through setting up the environment, generating embeddings, and implementing the similarity search and recommendation functionalities.

Integration with Twelve Labs

This section describes how you can use the Twelve Labs Python SDK to create embeddings. The integration involves creating two types of embeddings:

  • Video embeddings from your video content
  • Text embeddings from queries

Video embeddings

The get_video_embeddings function creates a Pandas UDF to generate multimodal embeddings using Twelve Labs Embed API:

from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import ArrayType, FloatType
from twelvelabs.models.embed import EmbeddingsTask
import pandas as pd

@pandas_udf(ArrayType(FloatType()))
def get_video_embeddings(urls: pd.Series) -> pd.Series:
    def generate_embedding(video_url):
        twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY)
        task = twelvelabs_client.embed.task.create(
            engine_name="Marengo-retrieval-2.6",
            video_url=video_url
        )
        task.wait_for_done()
        task_result = twelvelabs_client.embed.task.retrieve(task.id)
        embeddings = []
        for v in task_result.video_embeddings:
            embeddings.append({
                'embedding': v.embedding.float,
                'start_offset_sec': v.start_offset_sec,
                'end_offset_sec': v.end_offset_sec,
                'embedding_scope': v.embedding_scope
            })
        return embeddings

    def process_url(url):
        embeddings = generate_embedding(url)
        return embeddings[0]['embedding'] if embeddings else None

    return urls.apply(process_url)

For details on creating video embeddings, see the Create video embeddings page.

Text embeddings

The get_text_embedding function generates text embeddings:

def get_text_embedding(text_query):
    # Twelve Labs Embed API supports text-to-embedding
    text_embedding = twelvelabs_client.embed.create(
      engine_name="Marengo-retrieval-2.6",
      text=text_query,
      text_truncate="start"
    )

    return text_embedding.text_embedding.float

For details on creating video embeddings, see the Create text embeddings page.

Similarity search

The similarity_search function generates an embedding for a text query, and uses the Mosaic AI Vector Search index to find similar videos:

def similarity_search(query_text, num_results=5):
    # Initialize the Vector Search client and get the query embedding
    mosaic_client = VectorSearchClient()
    query_embedding = get_text_embedding(query_text)

    print(f"Query embedding generated: {len(query_embedding)} dimensions")

    # Perform the similarity search
    results = index.similarity_search(
        query_vector=query_embedding,
        num_results=num_results,
        columns=["id", "url", "title"]
    )
    return results

Video recommendation

The get_video_recommendations takes a video ID and the number of recommendations to return as parameters and performs a similarity search to find the most similar videos.

def get_video_recommendations(video_id, num_recommendations=5):
    # Initialize the Vector Search client
    mosaic_client = VectorSearchClient()

    # First, retrieve the embedding for the given video_id
    source_df = spark.table("videos_source_embeddings")
    video_embedding = source_df.filter(f"id = {video_id}").select("embedding").first()

    if not video_embedding:
        print(f"No video found with id: {video_id}")
        return []

    # Perform similarity search using the video's embedding
    try:
        results = index.similarity_search(
            query_vector=video_embedding["embedding"],
            num_results=num_recommendations + 1,  # +1 to account for the input video
            columns=["id", "url", "title"]
        )
        
        # Parse the results
        recommendations = parse_search_results(results)
        
        # Remove the input video from recommendations if present
        recommendations = [r for r in recommendations if r.get('id') != video_id]
        
        return recommendations[:num_recommendations]
    except Exception as e:
        print(f"Error during recommendation: {e}")
        return []

# Helper function to display recommendations
def display_recommendations(recommendations):
    if recommendations:
        print(f"Top {len(recommendations)} recommended videos:")
        for i, video in enumerate(recommendations, 1):
            print(f"{i}. Title: {video.get('title', 'N/A')}")
            print(f"   URL: {video.get('url', 'N/A')}")
            print(f"   Similarity Score: {video.get('score', 'N/A')}")
            print()
    else:
        print("No recommendations found.")

# Example usage
video_id = 1  # Assuming this is a valid video ID in your dataset
recommendations = get_video_recommendations(video_id)
display_recommendations(recommendations)

Next steps

After reading this page, you have the following options:

  • Customize and use the example: After implementing the basic integration, consider these improvements:
    • Update and synchronize the index: Implement efficient incremental updates and scheduled synchronization jobs using Delta Lake features.
    • Optimize performance and scaling: Leverage distributed processing, intelligent caching, and index partitioning for larger video libraries
    • Monitoring and analytics: Track key performance metrics, implement feedback loops, and correlate capabilities with business metrics
  • Explore further: Try the applications built by the community or our sample applications to get more insights into the Twelve Labs Video Understanding Platform's diverse capabilities and learn more about integrating the platform into your applications.