Summary: This integration combines TwelveLabs’ Embed and Generate APIs with Pinecone’s hosted vector database to build RAG-based video Q&A applications. It transforms video content into rich embeddings that can be stored, indexed, and queried to extract text answers from unstructured video databases.

Description: The process of performing video-based question answering using TwelveLabs and Pinecone involves the following steps:

Generate rich, contextual embeddings from your video content using the Embed API
Store and index these embeddings in Pinecone’s vector database
Perform semantic searches to find relevant video segments
Generate natural language responses using Generate API

This integration also showcases the difference in developer experience between using the Generate API to generate text responses and a leading open-source model, LLaVA-NeXT-Video, allowing you to compare approaches and select the most suitable solution for your needs.

Step-by-step guide: Our blog post, Multimodal RAG: Chat with Videos Using TwelveLabs and Pinecone, guides you through the process of creating a RAG-based video Q&A application.

Colab Notebook: TwelveLabs_Pinecone_Chat_with_video.

Integration with TwelveLabs

This section describes how the application uses the TwelveLabs Python SDK with Pinecone to create a video Q&A application. The integration is comprised of the following main steps:

Video embedding generation using the Embed API
Vector database storage and indexing
Similarity search for relevant video segments
Natural language response generation using the Generate API

Video Embeddings

The generate_embedding function generates embeddings for a video file:

Python

1 def generate_embedding(video_file, engine="Marengo-retrieval-2.6"):
2     """
3     Generate embeddings for a video file using TwelveLabs API.
4     
5     Args:
6         video_file (str): Path to the video file
7         engine (str): Embedding engine name
8         
9     Returns:
10         tuple: Embeddings and metadata
11     """
12     # Create an embedding task
13     task = twelvelabs_client.embed.task.create(
14         engine_name=engine,
15         video_file=video_file
16     )
17     print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")
18     
19     # Monitor task progress
20     def on_task_update(task: EmbeddingsTask):
21         print(f"  Status={task.status}")
22     
23     status = task.wait_for_done(
24         sleep_interval=2,
25         callback=on_task_update
26     )
27     print(f"Embedding done: {status}")
28     
29     # Retrieve results
30     task_result = twelvelabs_client.embed.task.retrieve(task.id)
31     
32     # Extract embeddings and metadata
33     embeddings = task_result.float
34     time_ranges = task_result.time_ranges
35     scope = task_result.scope
36     
37     return embeddings, time_ranges, scope

For details on creating video embeddings, see the Create video embeddings page.

The ingest_data function stores embeddings in Pinecone:

Python

1 def ingest_data(video_file, index_name="twelve-labs"):
2     """
3     Generate embeddings and store them in Pinecone.
4     
5     Args:
6         video_file (str): Path to the video file
7         index_name (str): Name of the Pinecone index
8     """
9     # Generate embeddings
10     embeddings, time_ranges, scope = generate_embedding(video_file)
11     
12     # Connect to Pinecone index
13     index = pc.Index(index_name)
14     
15     # Prepare vectors for upsert
16     vectors = []
17     for i, embedding in enumerate(embeddings):
18         vectors.append({
19             "id": f"{video_file}_{i}",
20             "values": embedding,
21             "metadata": {
22                 "video_file": video_file,
23                 "time_range": time_ranges[i],
24                 "scope": scope
25             }
26         })
27     
28     # Upsert vectors to Pinecone
29     index.upsert(vectors=vectors)
30     print(f"Successfully ingested {len(vectors)} embeddings into Pinecone")

Video search

The search_video_segments function creates text embeddings and performs similarity searches to find relevant video segments using the embeddings that have already been stored in Pinecone:

Python

1 def search_video_segments(question, index_name="twelve-labs", top_k=5):
2     """
3     Search for relevant video segments based on a question.
4     
5     Args:
6         question (str): Question text
7         index_name (str): Name of the Pinecone index
8         top_k (int): Number of results to retrieve
9         
10     Returns:
11         list: Relevant video segments and their metadata
12     """
13     # Generate text embedding for the question
14     question_embedding = twelvelabs_client.embed.create(
15         engine_name="Marengo-retrieval-2.6",
16         text=question
17     ).text_embedding.float
18     
19     # Query Pinecone
20     index = pc.Index(index_name)
21     query_results = index.query(
22         vector=question_embedding,
23         top_k=top_k,
24         include_metadata=True
25     )
26     
27     # Process and return results
28     results = []
29     for match in query_results.matches:
30         results.append({
31             "score": match.score,
32             "video_file": match.metadata["video_file"],
33             "time_range": match.metadata["time_range"],
34             "scope": match.metadata["scope"]
35         })
36     
37     return results

For details on creating text embeddings, see the Create text embeddings page.

Natural language responses

After retrieving relevant video segments, the application uses the Generate API to create natural language responses:

Python

1 def generate_response(question, video_segments):
2     """
3     Generate a natural language response using Pegasus.
4     
5     Args:
6         question (str): The user's question
7         video_segments (list): Relevant video segments from search
8         
9     Returns:
10         str: Generated response based on video content
11     """
12     # Prepare context from video segments
13     context = []
14     for segment in video_segments:
15         # Get the video clip based on time range
16         video_file = segment["video_file"]
17         start_time, end_time = segment["time_range"]
18         
19         # You can extract the clip or use the metadata directly
20         context.append({
21             "content": f"Video segment from {video_file}, {start_time}s to {end_time}s",
22             "score": segment["score"]
23         })
24     
25     # Generate response using TwelveLabs Generate API
26     response = twelvelabs_client.generate.create(
27         engine_name="Pegasus-1.0",
28         prompt=question,
29         contexts=context,
30         max_tokens=250
31     )
32     
33     return response.generated_text

For details on generating open-ended texts from videos see the Open-ended text page.

Create a complete Q&A function

The application creates a complete Q&A function by combining search and response generation:

Python

1 def video_qa(question, index_name="twelve-labs"):
2     """
3     Complete video Q&A pipeline.
4     
5     Args:
6         question (str): User's question
7         index_name (str): Pinecone index name
8         
9     Returns:
10         dict: Response with answer and supporting video segments
11     """
12     # Find relevant video segments
13     video_segments = search_video_segments(question, index_name)
14     
15     # Generate response using Pegasus
16     answer = generate_response(question, video_segments)
17     
18     return {
19         "question": question,
20         "answer": answer,
21         "supporting_segments": video_segments
22     }

Next steps

After reading this page, you have the following options:

Customize and use the example: Use the TwelveLabs_Pinecone_Chat_with_video notebook to understand how the integration works. You can make changes and add functionalities to suit your specific use case. Below are a few examples:
- Training a linear adapter on top of the embeddings to better fit your data.
- Re-ranking videos using Pegasus when clips from different videos are returned.
- Adding textual summary data for each video to the Pinecone entries to create a hybrid search system, enhancing accuracy using Pinecone’s Metadata capabilities.
Explore further: Try the applications built by the community or our sample applications to get more insights into the TwelveLabs Video Understanding Platform’s diverse capabilities and learn more about integrating the platform into your applications.