Vespa - Multivector video retrieval with TwelveLabs and Vespa

Summary: This integration combines TwelveLabs’ Generate and Embed APIs with Vespa to create an efficient solution for semantic video search. It captures rich video content as multimodal embeddings and utilizes Vespa’s robust indexing and hybrid ranking capabilities to deliver precise and relevant search results.

Description: The process of performing a semantic video search using TwelveLabs and Vespa involves three main steps:

  1. Create summaries and keywords using the Generate API.
  2. Create multimodal embeddings for your video content using the Embed API.
  3. Deploy a Vespa application to index these embeddings.
  4. Use the embeddings to perform vector searches with hybrid ranking.
  5. Review the results.

Step-by-step guide: Our blog post, Multivector Video Retrieval with TwelveLabs and Vespa, guides you through the process of building a semantic video search solution.

Colab Notebook: video_search_twelvelabs_cloud

Integration with TwelveLabs

This section shows how to use the Generate and Embed APIs for creating video embeddings and metadata, which facilitate the efficient retrieval of relevant video segments.

Generate summaries and keywords

The code below uploads videos to an index and monitors the processing status:

Python
1def on_task_update(task: EmbeddingsTask):
2 print(f" Status={task.status}")
3
4for video_url in VIDEO_URLs:
5 task = client.task.create(index_id=index.id, url=video_url, language="en")
6 status = task.wait_for_done(sleep_interval=10, callback=on_task_update)
7 if task.status != "ready":
8 raise RuntimeError(f"Indexing failed with status {task.status}")

See the Upload videos section for details.

Once the videos are processed, you can generate rich metadata using the /summarize and /generate endpoints. This code creates summaries and lists of keywords for each video to enhance search capabilities:

Python
1summaries = []
2keywords_array = []
3titles = [
4 "Mr. Bean the Animated Series Holiday for Teddy",
5 "Twas the night before Christmas",
6 "Hide and Seek with Giant Jenny",
7]
8
9videos = client.index.video.list(index_id)
10for video in videos:
11 # Generate summary
12 res = client.generate.summarize(
13 video_id=video.id,
14 type="summary",
15 prompt="Generate an abstract of the video serving as metadata on the video, up to five sentences."
16 )
17 summaries.append(res.summary)
18
19 # Generate keywords
20 keywords = client.generate.text(
21 video_id=video.id,
22 prompt="Based on this video, I want to generate five keywords for SEO. Provide just the keywords as a comma delimited list."
23 )
24 keywords_array.append(keywords.data)

See the Generate text from videos for details.

Create video embeddings

The code below creates multimodal embeddings for each video. These embeddings capture the temporal and contextual nuances of the video content:

Python
1task_ids = []
2
3for url in VIDEO_URLs:
4 task = client.embed.task.create(model_name="Marengo-retrieval-2.7", video_url=url)
5 task_ids.append(str(task.id))
6 status = task.wait_for_done(sleep_interval=10, callback=on_task_update)
7 if task.status != "ready":
8 raise RuntimeError(f"Embedding failed with status {task.status}")
9
10tasks = []
11for task_id in task_ids:
12 task = client.embed.task.retrieve(task_id)
13 tasks.append(task)

See the Create video embeddings section for details.

Create text emeddings

The code below generates an embedding for your text query:

Python
1client = TwelveLabs(api_key=TL_API_KEY)
2user_query = "Santa Claus on his sleigh"
3
4# Generate embedding for the query
5res = client.embed.create(
6 model_name="Marengo-retrieval-2.7",
7 text=user_query,
8)
9
10print("Created a text embedding")
11print(f" Model: {res.model_name}")
12if res.text_embedding is not None and res.text_embedding.segments is not None:
13 q_embedding = res.text_embedding.segments[0].embeddings_float
14 print(f" Embedding Dimension: {len(q_embedding)}")
15 print(f" Sample 5 values from array: {q_embedding[:5]}")

See the Create text embeddings section for details.

Perform hybrid searches

The code below uses Vespa’s approximate nearest neighbor (ANN) search capabilities to combine lexical search (BM25) with vector similarity ranking. The query retrieves the top hit based on hybrid ranking:

Python
1with app.syncio(connections=1) as session:
2 response: VespaQueryResponse = session.query(
3 yql="select * from videos where userQuery() OR ({targetHits:100}nearestNeighbor(embeddings,q))",
4 query=user_query,
5 ranking="hybrid",
6 hits=1,
7 body={"input.query(q)": q_embedding},
8 )
9 assert response.is_successful()
10
11# Print the top hit
12for hit in response.hits:
13 print(json.dumps(hit, indent=4))
14
15# Get full response JSON
16response.get_json()

Next steps

After reading this page, you have the following options:

  • Customize and use the example: Use the video_search_twelvelabs_cloud notebook to understand how the integration works. You can make changes and add functionalities to suit your specific use case.
  • Explore further: Try the applications built by the community or our sample applications to get more insights into the TwelveLabs Video Understanding Platform’s diverse capabilities and learn more about integrating the platform into your applications.
Built with