Summary: This integration combines TwelveLabs’ Analyze and Embed APIs with Vespa to create an efficient solution for semantic video search. It captures rich video content as multimodal embeddings and utilizes Vespa’s robust indexing and hybrid ranking capabilities to deliver precise and relevant search results.

Description: The process of performing a semantic video search using TwelveLabs and Vespa involves three main steps:

Create summaries and keywords using the Analyze API.
Create multimodal embeddings for your video content using the Embed API.
Deploy a Vespa application to index these embeddings.
Use the embeddings to perform vector searches with hybrid ranking.
Review the results.

Code explanation: Our blog post, Multivector Video Retrieval with TwelveLabs and Vespa, guides you through the process of building a semantic video search solution.

Colab Notebook: video_search_twelvelabs_cloud

Integration with TwelveLabs

This section shows how to use the Generate and Embed APIs for creating video embeddings and metadata, which facilitate the efficient retrieval of relevant video segments.

Generate summaries and keywords

The code below uploads videos to an index and monitors the processing status:

Python

1 def on_task_update(task: EmbeddingsTask):
2     print(f"  Status={task.status}")
3 
4 for video_url in VIDEO_URLs:
5     task = client.task.create(index_id=index.id, url=video_url, language="en")
6     status = task.wait_for_done(sleep_interval=10, callback=on_task_update)
7     if task.status != "ready":
8         raise RuntimeError(f"Indexing failed with status {task.status}")

Once the videos are processed, you can generate rich metadata using the /summarize and /analyze endpoints. This code creates summaries and lists of keywords for each video to enhance search capabilities:

Python

1 summaries = []
2 keywords_array = []
3 titles = [
4     "Mr. Bean the Animated Series Holiday for Teddy",
5     "Twas the night before Christmas",
6     "Hide and Seek with Giant Jenny",
7 ]
8 
9 videos = client.index.video.list(index_id)
10 for video in videos:
11     # Generate summary
12     res = client.generate.summarize(
13         video_id=video.id,
14         type="summary",
15         prompt="Generate an abstract of the video serving as metadata on the video, up to five sentences."
16     )
17     summaries.append(res.summary)
18     
19     # Generate keywords
20     keywords = client.generate.text(
21         video_id=video.id,
22         prompt="Based on this video, I want to generate five keywords for SEO. Provide just the keywords as a comma delimited list."
23     )
24     keywords_array.append(keywords.data)

Create video embeddings

The code below creates multimodal embeddings for each video. These embeddings capture the temporal and contextual nuances of the video content:

Python

1 task_ids = []
2 
3 for url in VIDEO_URLs:
4     task = client.embed.task.create(model_name="Marengo-retrieval-2.7", video_url=url)
5     task_ids.append(str(task.id))
6     status = task.wait_for_done(sleep_interval=10, callback=on_task_update)
7     if task.status != "ready":
8         raise RuntimeError(f"Embedding failed with status {task.status}")
9 
10 tasks = []
11 for task_id in task_ids:
12     task = client.embed.task.retrieve(task_id)
13     tasks.append(task)

See the Create video embeddings section for details.

Create text emeddings

The code below generates an embedding for your text query:

Python

1 client = TwelveLabs(api_key=TL_API_KEY)
2 user_query = "Santa Claus on his sleigh"
3 
4 # Generate embedding for the query
5 res = client.embed.create(
6     model_name="Marengo-retrieval-2.7",
7     text=user_query,
8 )
9 
10 print("Created a text embedding")
11 print(f" Model: {res.model_name}")
12 if res.text_embedding is not None and res.text_embedding.segments is not None:
13     q_embedding = res.text_embedding.segments[0].embeddings_float
14     print(f" Embedding Dimension: {len(q_embedding)}")
15     print(f" Sample 5 values from array: {q_embedding[:5]}")

See the Create text embeddings section for details.

Perform hybrid searches

The code below uses Vespa’s approximate nearest neighbor (ANN) search capabilities to combine lexical search (BM25) with vector similarity ranking. The query retrieves the top hit based on hybrid ranking:

Python

1 with app.syncio(connections=1) as session:
2     response: VespaQueryResponse = session.query(
3         yql="select * from videos where userQuery() OR ({targetHits:100}nearestNeighbor(embeddings,q))",
4         query=user_query,
5         ranking="hybrid",
6         hits=1,
7         body={"input.query(q)": q_embedding},
8     )
9     assert response.is_successful()
10 
11 # Print the top hit
12 for hit in response.hits:
13     print(json.dumps(hit, indent=4))
14 
15 # Get full response JSON
16 response.get_json()

Next steps

After reading this page, you have the following options:

Customize and use the example: Use the video_search_twelvelabs_cloud notebook to understand how the integration works. You can make changes and add functionalities to suit your specific use case.
Explore further: Try the applications built by the community or our sample applications to get more insights into the TwelveLabs Video Understanding Platform’s diverse capabilities and learn more about integrating the platform into your applications.