The example projects on this page utilize the Twelve Labs Video Understanding Platform to create social and public goods. These projects demonstrate how multimodal AI can drive positive changes, exemplifying its transformative power.

Israel Palestine Video Understanding

Summary: The "Israel Palestine Video Understanding" application addresses misinformation and promotes empathy regarding the Israel-Palestine conflict.

Description: The application aggregates and summarizes content from YouTube and Reddit, presenting diverse viewpoints on the issue. These summaries, covering a range of opinions, are then visualized using an algorithm similar to T-SNE , offering a comprehensive understanding of the conflict's various perspectives. The application was developed by Sasha Sheng.

GitHub repo: Israel Palestine Video Understanding

Integration with Twelve Labs

This application invokes the /summarize endpoint to create summaries for videos based on their content, specifically focusing on their stance regarding the Israel-Palestine conflict and the level of violence depicted:

def generate_summary(videoID, videoID_to_filename):
    SUMMARIZE_URL = f"{API_URL}/summarize"
    headers = {
        "x-api-key": API_KEY
    }

    data = {
      "video_id": videoID,
      "type": "summary",
      "prompt": "Summarize if this video is pro-israel or pro-palestine or else and how violent it is."
    }

    response = requests.post(SUMMARIZE_URL, headers=headers, json=data)
    print(f"{videoID}: status code - {response.status_code}")

    summary_data = response.json()
    print(summary_data)

    with open(filename, 'a') as f:
        writer = csv.writer(f, delimiter='\t')
        writer.writerow([videoID, videoID_to_filename[videoID][0], videoID_to_filename[videoID][1], summary_data.get('summary')])

Accelerate SF Notifications

Summary: The "Accelerate SF Notifications" application simplifies public hearings for residents and special interest groups, particularly those focused on San Francisco housing developments.

Description: The application addresses the challenge of keeping up with numerous and lengthy public hearings, where the critical issue is identifying relevant discussions without watching entire meetings. The application was developed by Rahul Pal, Lloyd Chang, and Haonan Chen.

Key features include:

Data scraping: Extract information from public agendas, live-streamed hearings, and sources like San Francisco Gov TV.
Issue tracking: Utilize algorithms to pinpoint and extract discussions about housing projects and specific issues within hearings.
Automated notifications: Implement a system that sends real-time alerts.

GitHub repo: Accelerate SF Notifications

Integration with Twelve Labs

The application uses the /summarize endpoint to perform the following main functions: summarize videos and generate lists of chapters.

A summary encapsulates the key points of a video clearly. The code below shows how the application generates summaries:


data = {
    "video_id": "6545f931195730422cc38329",
    "type": "summary"
}

# Send request
response = requests.post(f"{BASE_URL}/summarize", json=data, headers={"x-api-key": api_key})

A list of chapters provides a chronological breakdown of all the parts in a video. The following code shows how the application generates lists of chapters:

data = {
    "video_id": "6545f931195730422cc38329",
    "type": "chapter"
}

# Send request
response = requests.post(f"{BASE_URL}/summarize", json=data, headers={"x-api-key": api_key})

The /gist endpoint generates swift breakdowns of the essence of your videos in the form of titles, topics, and hashtags. The following code shows how the application invokes this endpoint:

data = {
    "video_id": "6545f931195730422cc38329",
    "types": [
        "title",
        "hashtag",
        "topic"
    ]
}

# Send request
response = requests.post(f"{BASE_URL}/gist", json=data, headers={"x-api-key": api_key})

Deep Green

Summary: The "Dep Green" application uses the Twelve Labs Video Understanding Platform to accurately detect and map ocean trash using aerial and satellite imagery.

Description: The application offers a solution to the problem of plastic pollution in the oceans. It detects different types of ocean trash with over 90% accuracy and can scan over 500 hours of video daily. Trash is timestamped and geographically pinpointed, allowing easy data analysis and export. The application was developed by Shalini Ananda and Hans Walker.

GitHub: Deep Green

Integration with Twelve Labs

The search_trash function searches for videos containing specific types of trash, returning a list of such videos with key information about each:

def search_trash(query, API_KEY):

      data = {
            "query": query,
            "index_id": INDEX_ID,
            "search_options": ["visual"]
        }

      response = requests.post(f"{API_URL}/search", headers={"x-api-key": API_KEY}, json=data)

      response = response.json()


      results = []

      # Getting thumbnail and relevant data
      for i in range(len(response['data'])):
            score = response['data'][i]['score']
            video_id = response['data'][i]['video_id']
            video_location = Get_Video_Metadata(response['data'][i]['video_id'],API_KEY)['Location Type']
            thumbnail_url = response['data'][i]['thumbnail_url']
            results.append({"score": score,"video_location": video_location, "video_id": video_id, "thumbnail_url": thumbnail_url})

        return results

The search_video_single function finds specific content within a single video:

def search_video_single(video_id, query, API_KEY):


    headers = {
    "accept": "application/json",
    "x-api-key": API_KEY,
    "Content-Type": "application/json"}
    
    data = {
    "query": query,
    "search_options": ["visual", "conversation", "text_in_video", "logo"],
    "threshold": "high",
    "filter": { "id": [video_id] },
    "index_id": INDEX_ID }


    response = requests.post(f"{API_URL}/search", headers=headers, json=data)

    results = []

    for i in range(len(response.json()['data'])):
        score = response.json()['data'][i]['score']
        video_id = response.json()['data'][i]['video_id']
        results.append({"score": score, 'start_time':response.json()['data'][i]['start'], 
                        'end_time':response.json()['data'][i]['end']})

The classify_latest_video function classifies videos into specific environmental categories:

def classify_latest_video(id, file_name, API_KEY):
    classify_url = f"{API_URL}/classify"
    file_name = file_name.split('.')[0]

    video_list = get_video_list(API_KEY)
    time_initiated = time.time()
    video_uploaded=True
    video_index = 0
    for i, next_video in enumerate(video_list):
        if(next_video['metadata']['filename']==file_name):
            video_uploaded=False
            video_index = i
            break
    while(video_uploaded):
        time.sleep(60)
        video_list = get_video_list(API_KEY)
        for i, next_video in enumerate(video_list):
            print(next_video['metadata']['filename'],"   ",file_name)
            if(next_video['metadata']['filename']==file_name):
                video_uploaded=False
                video_index = i
                break
    
    id = video_list[video_index]["_id"]

    meta_url = f"{API_URL}/indexes/{INDEX_ID}/videos/{id}"

    print("\n\nStarting Metadata",time.time()-time_initiated,"\n\n", file=sys.stderr)
    payload = {
        "page_limit": 10,
        "include_clips": False,
        "threshold": {
            "min_video_score": 15,
            "min_clip_score": 15,
            "min_duration_ratio": 0.5
        },
        "show_detailed_score": False,
        "options": ["conversation"],
        "conversation_option": "semantic",
        "classes": [
            {
                "prompts": ["This video is taken in an urban enviorment", "This means a dense environment", "Lots of people, cars and buildings"],
                "options": ["visual"],
                "conversation_option": "semantic",
                "name": "Urban"
            },
            {
                "prompts": ["This video is taken in a suburban enviorment", "There should be buildings, roads", "Everything should be a lot more spread out", "The majority of the space should be developed"],
                "options": ["visual"],
                "conversation_option": "semantic",
                "name": "Suburban"
            },
            {
                "prompts": ["This video was taken in a rural enviorment", "There shouldn't be a ton of human development", "Buildings should be extremly spread out", "Should mostly be nature", "Very few humans around"],
                "options": ["visual"],
                "conversation_option": "semantic",
                "name": "Rural"
            }
        ],
        "video_ids": [id]
    }
    headers = {
        "accept": "application/json",
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }

    response = requests.post(classify_url, json=payload, headers=headers)
    response = response.json()
    
    print(response, file=sys.stderr)
    video_class = response['data'][0]['classes'][0]['name']

    payload = { "metadata": { "Location Type": video_class } }
    headers = {
        "accept": "application/json",
        "x-api-key": API_KEY,
        "Content-Type": "application/json"
    }

    response = requests.put(meta_url, json=payload, headers=headers)

RememberMe - Dementia Assistant

Summary: The project addresses the critical challenge of assisting individuals with dementia in retaining their independence and enhancing their quality of life.

Description: The application is a comprehensive digital support system with a home screen displaying the current date, important reminders, and action buttons. It also has a chatbot that users can use to ask questions about their lives. The application collects data such as video, audio, and personal notes, and it utilizes the Twelve Labs Video Understanding Platform to convert multimedia information into text for the chatbot database's organizational and storage purposes. The objective is to provide a seamless and intuitive platform that enables users to recall important details about their lives, manage daily tasks, and maintain connections with people and places that matter to them. The application was developed by Tatiane Wu Li, Pedro Goncalves de Paiva, Aleksei (Alex) Korablev, and Na Le.

GitHub: RememberMe.

Presentation: RememberMe.

Integration with Twelve Labs

The submit_video_for_processing function uploads a video to the platform by invoking the POST method of the /tasks/external-provider endpoint. Upon receiving the response, the function processes it to determine the outcome. If the upload is successful, the function returns the unique identifier of the submitted task. In case of an error, the function returns an error message that details the specific reason for the failure. This helps developers identify and resolve any issues with the video upload process.

import requests
from pprint import pprint

# Constants
API_URL = "https://api.twelvelabs.io/v1.2"
API_KEY = "<YOUR_API_KEY>"
INDEX_ID = "<YOUR_INDEX_ID>"  # Replace with your actual index ID obtained from creating an index

# Function to submit a video URL for processing by an external provider
def submit_video_for_processing(video_url):
    """Submit a video URL to an external processing service and return the task ID."""
    TASKS_URL = f"{API_URL}/tasks/external-provider"
    headers = {"x-api-key": API_KEY}
    data = {"index_id": INDEX_ID, "url": video_url}
    response = requests.post(TASKS_URL, headers=headers, json=data)
    if response.status_code == 201:
        task_id = response.json().get("_id")
        print(f"Task submitted successfully. Task ID: {task_id}")
        return task_id
    else:
        print(f"Failed to submit task: {response.status_code}")
        pprint(response.json())
        return None

# Example usage
video_url = "https://www.youtube.com/watch?v=TLwhqmf4Td4&ab_channel=RGSACHIN"
task_id = submit_video_for_processing(video_url)

The get_video_summary function takes the unique identifier of a video as a parameter and invokes the POST method of the /generate endpoint to summarize it. If successful, it returns the generated summary; otherwise, it prints an error message and returns None.

def get_video_summary(video_id):
    GENERATE_URL = f"{API_URL}/generate"  # Define the URL to generate the summary
    data = {"video_id": video_id, "prompt": "Make a summary"}  # Set up the data payload
    response = requests.post(GENERATE_URL, headers=headers, json=data)  # Make the POST request
    if response.status_code == 200:
        summary = response.json().get('data')  # Get the summary data from the response
        print("Video summary generated successfully.")
        return summary  # Return the summary
    else:
        print(f"Failed to generate summary: {response.status_code}")  # Print failure message
        pprint(response.json())
        return None  # Return None if summary generation fails

CamSense AI

Summary: "CamSense AI" is an AI-powered application that assesses webcam videos, providing instant insights and alerts. It uses the Twelve Labs Video Understanding Platform to analyze video content and identify significant changes or events.

Description: The application addresses the challenge of custom trigger creation based on content understanding of unattended recorded video. This solution is particularly useful in ecology, fire safety, and flood water level monitoring.

The typical workflow is as follows:

The Twelve Labs Video Understanding Platform generates embeddings for the reference frame and the subsequent video clips and summarizes them.
The application uses Groq to produce natural language descriptions of the differences.
The application determines the significance of these differences.
Clips that differ significantly are logged along with their timestamps and descriptions.
The process concludes with the aggregation of all logs into a final report

The application was developed by Daniel Talero, Paul Kubie, and Todd Gardiner.

Colab notebook: hackathon.ipynb .

Integration with Twelve Labs

The code below creates a video indexing task that uploads a video to the Twelve Labs Video Understanding Platform by invoking the create method of the task object:

video_files = glob(reference_filename) # Example: "/videos/*.mp4


print(f"Uploading {reference_filename}")
task = client.task.create(index_id=index_obj.id, file=reference_filename, language="en")
print(f"Task id={task.id}")
print(f"Task_video_id = {task.video_id}")
ref_id = task.video_id


frame_id = []
if len(rawvids) > 0 :
 for i in range(len(rawvids)):
   video_files = glob( ("/content/rawdata/" + str(rawvids[i]) ) ) # Example: "/videos/*.mp4
   print(f"Uploading {rawvids[i]}")
   task = client.task.create(index_id=index_obj.id, file= ("/content/rawdata/" + str(rawvids[i]) ) , language="en")
   print(f"Task id={task.id}")
   frame_id.append(task.video_id)

The code below invokes the create method of the embed.task object to create an embedding for the reference frame:

task = client.embed.task.create(
   engine_name="Marengo-retrieval-2.6",
   video_url="https://storage.googleapis.com/lab-storage-items/sample-5s.mp4")
print(
   f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}"
)


def on_task_update(task: EmbeddingsTask):
   print(f"  Status={task.status}")


status = task.wait_for_done(
 sleep_interval=5,
 callback=on_task_update
 )
print(f"Embedding done: {status}")
task = client.embed.task.retrieve(task.id)
if task.video_embeddings is not None:
   for v in task.video_embeddings:
       print(
           f"embedding_scope={v.embedding_scope} start_offset_sec={v.start_offset_sec} end_offset_sec={v.end_offset_sec}"
       )
       print(f"embeddings: {', '.join([str(x) for x in v.embedding.float])}")
       ref_emb = np.array([str(x) for x in v.embedding.float])

The code below creates embeddings for the subsequent frames:

fref_emb = []
for i in range(len(rawvids)):


 task = client.embed.task.create(
     engine_name="Marengo-retrieval-2.6",
     video_url="https://storage.googleapis.com/lab-storage-items/sample-5s.mp4")
 print(
     f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}"
 )

 status = task.wait_for_done(
   sleep_interval=5,
   callback=on_task_update
   )
 print(f"Embedding done: {status}")
 task = client.embed.task.retrieve(task.id)
 if task.video_embeddings is not None:
     for v in task.video_embeddings:
         print(
             f"embedding_scope={v.embedding_scope} start_offset_sec={v.start_offset_sec} end_offset_sec={v.end_offset_sec}"
         )
         print(f"embeddings: {', '.join([str(x) for x in v.embedding.float])}")
         fref_emb.append(np.array([str(x) for x in v.embedding.float]))

The code below invokes the summarize method of the generate object to summarize the reference frame:

res = client.generate.summarize(ref_id, type='summary', prompt="In a detailed way, describe this video clip." )

The code below summarizes each subsequent video and stores the results in a list:

The code below summarizes each subsequent video and stores the results in a list:
fres = []
fres_emb = []
for i in range(len(rawvids)):
 res2 = client.generate.summarize(frame_id[i], type='summary', prompt="In a detailed way, describe this video clip." )
 fres.append(res2)
 fres_emb.append(model.encode(res2.summary))