Pinecone - Multimodal RAG

Summary: This integration combines Twelve Labs' Embed and Generate APIs with Pinecone's hosted vector database to build RAG-based video Q&A applications. It transforms video content into rich embeddings that can be stored, indexed, and queried to extract text answers from unstructured video databases.
Description: The process of performing video-based question answering using Twelve Labs and Pinecone involves the following steps:
- Generate rich, contextual embeddings from your video content using the Embed API
- Store and index these embeddings in Pinecone's vector database
- Perform semantic searches to find relevant video segments
- Generate natural language responses using Generate API
This integration also showcases the difference in developer experience between using the Generate API to generate text responses and a leading open-source model, LLaVA-NeXT-Video, allowing you to compare approaches and select the most suitable solution for your needs.
Step-by-step guide: Our blog post, Multimodal RAG: Chat with Videos Using Twelve Labs and Pinecone, guides you through the process of creating a RAG-based video Q&A application.
Colab Notebook: TwelveLabs_Pinecone_Chat_with_video.
Integration with Twelve Labs
This section describes how the application uses the Twelve Labs Python SDK with Pinecone to create a video Q&A application. The integration is:
- Video embedding generation using the Embed API
- Vector database storage and indexing
- Similarity search for relevant video segments
- Natural language response generation using the Generate API
Video Embeddings
The generate_embedding
function generates embeddings for a video file:
def generate_embedding(video_file, engine="Marengo-retrieval-2.6"):
"""
Generate embeddings for a video file using Twelve Labs API.
Args:
video_file (str): Path to the video file
engine (str): Embedding engine name
Returns:
tuple: Embeddings and metadata
"""
# Create an embedding task
task = twelvelabs_client.embed.task.create(
engine_name=engine,
video_file=video_file
)
print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")
# Monitor task progress
def on_task_update(task: EmbeddingsTask):
print(f" Status={task.status}")
status = task.wait_for_done(
sleep_interval=2,
callback=on_task_update
)
print(f"Embedding done: {status}")
# Retrieve results
task_result = twelvelabs_client.embed.task.retrieve(task.id)
# Extract embeddings and metadata
embeddings = task_result.float
time_ranges = task_result.time_ranges
scope = task_result.scope
return embeddings, time_ranges, scope
For details on creating video embeddings, see the Create video embeddings page.
The ingest_data
function stores embeddings in Pinecone:
def ingest_data(video_file, index_name="twelve-labs"):
"""
Generate embeddings and store them in Pinecone.
Args:
video_file (str): Path to the video file
index_name (str): Name of the Pinecone index
"""
# Generate embeddings
embeddings, time_ranges, scope = generate_embedding(video_file)
# Connect to Pinecone index
index = pc.Index(index_name)
# Prepare vectors for upsert
vectors = []
for i, embedding in enumerate(embeddings):
vectors.append({
"id": f"{video_file}_{i}",
"values": embedding,
"metadata": {
"video_file": video_file,
"time_range": time_ranges[i],
"scope": scope
}
})
# Upsert vectors to Pinecone
index.upsert(vectors=vectors)
print(f"Successfully ingested {len(vectors)} embeddings into Pinecone")
Video search
The search_video_segments
function creates text embeddings and performs similarity searches to find relevant video segments using the embeddings that have already been stored in Pinecone:
def search_video_segments(question, index_name="twelve-labs", top_k=5):
"""
Search for relevant video segments based on a question.
Args:
question (str): Question text
index_name (str): Name of the Pinecone index
top_k (int): Number of results to retrieve
Returns:
list: Relevant video segments and their metadata
"""
# Generate text embedding for the question
question_embedding = twelvelabs_client.embed.create(
engine_name="Marengo-retrieval-2.6",
text=question
).text_embedding.float
# Query Pinecone
index = pc.Index(index_name)
query_results = index.query(
vector=question_embedding,
top_k=top_k,
include_metadata=True
)
# Process and return results
results = []
for match in query_results.matches:
results.append({
"score": match.score,
"video_file": match.metadata["video_file"],
"time_range": match.metadata["time_range"],
"scope": match.metadata["scope"]
})
return results
For details on creating text embeddings, see the Create text embeddings page.
Natural language responses
After retrieving relevant video segments, the application uses the Generate API to create natural language responses:
def generate_response(question, video_segments):
"""
Generate a natural language response using Pegasus.
Args:
question (str): The user's question
video_segments (list): Relevant video segments from search
Returns:
str: Generated response based on video content
"""
# Prepare context from video segments
context = []
for segment in video_segments:
# Get the video clip based on time range
video_file = segment["video_file"]
start_time, end_time = segment["time_range"]
# You can extract the clip or use the metadata directly
context.append({
"content": f"Video segment from {video_file}, {start_time}s to {end_time}s",
"score": segment["score"]
})
# Generate response using Twelve Labs Generate API
response = twelvelabs_client.generate.create(
engine_name="Pegasus-1.0",
prompt=question,
contexts=context,
max_tokens=250
)
return response.generated_text
For details on generating open-ended texts based on your videos, see the Open-ended text page.
Create a complete Q&A function
The application creates a complete Q&A function by combining search and response generation:
def video_qa(question, index_name="twelve-labs"):
"""
Complete video Q&A pipeline.
Args:
question (str): User's question
index_name (str): Pinecone index name
Returns:
dict: Response with answer and supporting video segments
"""
# Find relevant video segments
video_segments = search_video_segments(question, index_name)
# Generate response using Pegasus
answer = generate_response(question, video_segments)
return {
"question": question,
"answer": answer,
"supporting_segments": video_segments
}
Next steps
After reading this page, you have the following options:
- Customize and use the example: Use the TwelveLabs_Pinecone_Chat_with_video notebook to understand how the integration works. You can make changes and add functionalities to suit your specific use case. Below are a few examples:
- Training a linear adapter on top of the embeddings to better fit your data.
- Re-ranking videos using Pegasus when clips from different videos are returned.
- Adding textual summary data for each video to the Pinecone entries to create a hybrid search system, enhancing accuracy using Pinecone's Metadata capabilities.
-
- Explore further: Try the applications built by the community or our sample applications to get more insights into the Twelve Labs Video Understanding Platform's diverse capabilities and learn more about integrating the platform into your applications.
Updated 1 day ago