Databricks - Advanced video understanding

Summary: This integration combines Twelve Labs' Embed API with Databricks Mosaic AI Vector Search to create advanced video understanding applications. It transforms video content into multimodal embeddings that capture the relationships between visual expressions, body language, spoken words, and overall context, enabling powerful similarity search and recommendation systems.
Description: Integrating Twelve Labs with Databricks Mosaic AI addresses key challenges in video AI, such as efficient processing of large-scale video datasets and accurate multimodal content representation. The process involves the following main steps:
- Generate multimodal embeddings from video content using Twelve Labs' Embed API
- Store these embeddings along with video metadata in a Delta Table
- Configure Mosaic AI Vector Search with a Delta Sync Index to access the embeddings
- Generate text embeddings for search queries
- Perform similarity searches between text queries and video content
- Build a video recommendation system that suggests videos similar to a given video based on embedding similarities
Step-by-step guide: Our blog post, Mastering Multimodal AI: Advanced Video Understanding with Twelve Labs + Databricks Mosaic AI, guides you through setting up the environment, generating embeddings, and implementing the similarity search and recommendation functionalities.
Integration with Twelve Labs
This section describes how you can use the Twelve Labs Python SDK to create embeddings. The integration involves creating two types of embeddings:
- Video embeddings from your video content
- Text embeddings from queries
Video embeddings
The get_video_embeddings
function creates a Pandas UDF to generate multimodal embeddings using Twelve Labs Embed API:
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import ArrayType, FloatType
from twelvelabs.models.embed import EmbeddingsTask
import pandas as pd
@pandas_udf(ArrayType(FloatType()))
def get_video_embeddings(urls: pd.Series) -> pd.Series:
def generate_embedding(video_url):
twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY)
task = twelvelabs_client.embed.task.create(
engine_name="Marengo-retrieval-2.6",
video_url=video_url
)
task.wait_for_done()
task_result = twelvelabs_client.embed.task.retrieve(task.id)
embeddings = []
for v in task_result.video_embeddings:
embeddings.append({
'embedding': v.embedding.float,
'start_offset_sec': v.start_offset_sec,
'end_offset_sec': v.end_offset_sec,
'embedding_scope': v.embedding_scope
})
return embeddings
def process_url(url):
embeddings = generate_embedding(url)
return embeddings[0]['embedding'] if embeddings else None
return urls.apply(process_url)
For details on creating video embeddings, see the Create video embeddings page.
Text embeddings
The get_text_embedding
function generates text embeddings:
def get_text_embedding(text_query):
# Twelve Labs Embed API supports text-to-embedding
text_embedding = twelvelabs_client.embed.create(
engine_name="Marengo-retrieval-2.6",
text=text_query,
text_truncate="start"
)
return text_embedding.text_embedding.float
For details on creating video embeddings, see the Create text embeddings page.
Similarity search
The similarity_search
function generates an embedding for a text query, and uses the Mosaic AI Vector Search index to find similar videos:
def similarity_search(query_text, num_results=5):
# Initialize the Vector Search client and get the query embedding
mosaic_client = VectorSearchClient()
query_embedding = get_text_embedding(query_text)
print(f"Query embedding generated: {len(query_embedding)} dimensions")
# Perform the similarity search
results = index.similarity_search(
query_vector=query_embedding,
num_results=num_results,
columns=["id", "url", "title"]
)
return results
Video recommendation
The get_video_recommendations
takes a video ID and the number of recommendations to return as parameters and performs a similarity search to find the most similar videos.
def get_video_recommendations(video_id, num_recommendations=5):
# Initialize the Vector Search client
mosaic_client = VectorSearchClient()
# First, retrieve the embedding for the given video_id
source_df = spark.table("videos_source_embeddings")
video_embedding = source_df.filter(f"id = {video_id}").select("embedding").first()
if not video_embedding:
print(f"No video found with id: {video_id}")
return []
# Perform similarity search using the video's embedding
try:
results = index.similarity_search(
query_vector=video_embedding["embedding"],
num_results=num_recommendations + 1, # +1 to account for the input video
columns=["id", "url", "title"]
)
# Parse the results
recommendations = parse_search_results(results)
# Remove the input video from recommendations if present
recommendations = [r for r in recommendations if r.get('id') != video_id]
return recommendations[:num_recommendations]
except Exception as e:
print(f"Error during recommendation: {e}")
return []
# Helper function to display recommendations
def display_recommendations(recommendations):
if recommendations:
print(f"Top {len(recommendations)} recommended videos:")
for i, video in enumerate(recommendations, 1):
print(f"{i}. Title: {video.get('title', 'N/A')}")
print(f" URL: {video.get('url', 'N/A')}")
print(f" Similarity Score: {video.get('score', 'N/A')}")
print()
else:
print("No recommendations found.")
# Example usage
video_id = 1 # Assuming this is a valid video ID in your dataset
recommendations = get_video_recommendations(video_id)
display_recommendations(recommendations)
Next steps
After reading this page, you have the following options:
- Customize and use the example: After implementing the basic integration, consider these improvements:
- Update and synchronize the index: Implement efficient incremental updates and scheduled synchronization jobs using Delta Lake features.
- Optimize performance and scaling: Leverage distributed processing, intelligent caching, and index partitioning for larger video libraries
- Monitoring and analytics: Track key performance metrics, implement feedback loops, and correlate capabilities with business metrics
- Explore further: Try the applications built by the community or our sample applications to get more insights into the Twelve Labs Video Understanding Platform's diverse capabilities and learn more about integrating the platform into your applications.
Updated about 19 hours ago