Databricks - Advanced video understanding

Summary: This integration combines TwelveLabs’ Embed API with Databricks Mosaic AI Vector Search to create advanced video understanding applications. It transforms video content into multimodal embeddings that capture the relationships between visual expressions, body language, spoken words, and overall context, enabling powerful similarity search and recommendation systems.

Description: Integrating TwelveLabs with Databricks Mosaic AI addresses key challenges in video AI, such as efficient processing of large-scale video datasets and accurate multimodal content representation. The process involves these main steps:

  1. Generate multimodal embeddings from video content using TwelveLabs’ Embed API
  2. Store these embeddings along with video metadata in a Delta Table
  3. Configure Mosaic AI Vector Search with a Delta Sync Index to access the embeddings
  4. Generate text embeddings for search queries
  5. Perform similarity searches between text queries and video content
  6. Build a video recommendation system that suggests videos similar to a given video based on embedding similarities

Step-by-step guide: Our blog post, Mastering Multimodal AI: Advanced Video Understanding with TwelveLabs + Databricks Mosaic AI, guides you through setting up the environment, generating embeddings, and implementing the similarity search and recommendation functionalities.

Integration with TwelveLabs

This section describes how you can use the TwelveLabs Python SDK to create embeddings. The integration involves creating two types of embeddings:

  • Video embeddings from your video content
  • Text embeddings from queries

Video embeddings

The get_video_embeddings function creates a Pandas UDF to generate multimodal embeddings using TwelveLabs Embed API:

Python
1from pyspark.sql.functions import pandas_udf
2from pyspark.sql.types import ArrayType, FloatType
3from twelvelabs.models.embed import EmbeddingsTask
4import pandas as pd
5
6@pandas_udf(ArrayType(FloatType()))
7def get_video_embeddings(urls: pd.Series) -> pd.Series:
8 def generate_embedding(video_url):
9 twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY)
10 task = twelvelabs_client.embed.task.create(
11 engine_name="Marengo-retrieval-2.6",
12 video_url=video_url
13 )
14 task.wait_for_done()
15 task_result = twelvelabs_client.embed.task.retrieve(task.id)
16 embeddings = []
17 for v in task_result.video_embeddings:
18 embeddings.append({
19 'embedding': v.embedding.float,
20 'start_offset_sec': v.start_offset_sec,
21 'end_offset_sec': v.end_offset_sec,
22 'embedding_scope': v.embedding_scope
23 })
24 return embeddings
25
26 def process_url(url):
27 embeddings = generate_embedding(url)
28 return embeddings[0]['embedding'] if embeddings else None
29
30 return urls.apply(process_url)

For details on creating video embeddings, see the Create video embeddings page.

Text embeddings

The get_text_embedding function generates text embeddings:

Python
1def get_text_embedding(text_query):
2 # TwelveLabs Embed API supports text-to-embedding
3 text_embedding = twelvelabs_client.embed.create(
4 engine_name="Marengo-retrieval-2.6",
5 text=text_query,
6 text_truncate="start"
7 )
8
9 return text_embedding.text_embedding.float

For details on creating video embeddings, see the Create text embeddings page.

The similarity_search function generates an embedding for a text query, and uses the Mosaic AI Vector Search index to find similar videos:

Python
1def similarity_search(query_text, num_results=5):
2 # Initialize the Vector Search client and get the query embedding
3 mosaic_client = VectorSearchClient()
4 query_embedding = get_text_embedding(query_text)
5
6 print(f"Query embedding generated: {len(query_embedding)} dimensions")
7
8 # Perform the similarity search
9 results = index.similarity_search(
10 query_vector=query_embedding,
11 num_results=num_results,
12 columns=["id", "url", "title"]
13 )
14 return results

Video recommendation

The get_video_recommendations takes a video ID and the number of recommendations to return as parameters and performs a similarity search to find the most similar videos.

Python
1def get_video_recommendations(video_id, num_recommendations=5):
2 # Initialize the Vector Search client
3 mosaic_client = VectorSearchClient()
4
5 # First, retrieve the embedding for the given video_id
6 source_df = spark.table("videos_source_embeddings")
7 video_embedding = source_df.filter(f"id = {video_id}").select("embedding").first()
8
9 if not video_embedding:
10 print(f"No video found with id: {video_id}")
11 return []
12
13 # Perform similarity search using the video's embedding
14 try:
15 results = index.similarity_search(
16 query_vector=video_embedding["embedding"],
17 num_results=num_recommendations + 1, # +1 to account for the input video
18 columns=["id", "url", "title"]
19 )
20
21 # Parse the results
22 recommendations = parse_search_results(results)
23
24 # Remove the input video from recommendations if present
25 recommendations = [r for r in recommendations if r.get('id') != video_id]
26
27 return recommendations[:num_recommendations]
28 except Exception as e:
29 print(f"Error during recommendation: {e}")
30 return []
31
32# Helper function to display recommendations
33def display_recommendations(recommendations):
34 if recommendations:
35 print(f"Top {len(recommendations)} recommended videos:")
36 for i, video in enumerate(recommendations, 1):
37 print(f"{i}. Title: {video.get('title', 'N/A')}")
38 print(f" URL: {video.get('url', 'N/A')}")
39 print(f" Similarity Score: {video.get('score', 'N/A')}")
40 print()
41 else:
42 print("No recommendations found.")
43
44# Example usage
45video_id = 1 # Assuming this is a valid video ID in your dataset
46recommendations = get_video_recommendations(video_id)
47display_recommendations(recommendations)

Next steps

After reading this page, you have the following options:

  • Customize and use the example: After implementing the basic integration, consider these improvements:
    • Update and synchronize the index: Implement efficient incremental updates and scheduled synchronization jobs using Delta Lake features.
    • Optimize performance and scaling: Leverage distributed processing, intelligent caching, and index partitioning for larger video libraries
    • Monitoring and analytics: Track key performance metrics, implement feedback loops, and correlate capabilities with business metrics
Built with