Voxel51 - Semantic Video Search Plugin

Summary: The "Semantic Video Search" plugin integrates Voxel FiftyOne, an open-source tool for building and enhancing machine learning datasets, with the Twelve Labs Video Understanding Platform, enabling you to perform semantic searches across multiple modalities.

Description: The plugin allows you to accurately identify movements, actions, objects, people, sounds, on-screen text, and speech. For example, this feature is helpful in scenarios where you need to quickly locate and analyze specific scenes based on actions or spoken words, significantly improving your efficiency in categorizing and analyzing video data.

Step-by-step guide: Our blog post, Search Your Videos Semantically with Twelve Labs and FiftyOne Plugin, walks you through the steps required to create this plugin from scratch.

GitHub: Semantic Video Search

Integration with Twelve Labs

The integration with the Twelve Labs Video Understanding Platform is comprised of three distinct steps:

Create an index

The plugin invokes the POST method of the /indexes endpoint to create an index and enable the Marengo video understanding engine with the indexing options that the user has selected:

INDEX_NAME = ctx.params.get("index_name")

INDEXES_URL = f"{API_URL}/indexes"

headers = {
    "x-api-key": API_KEY
}

so = []

if ctx.params.get("visual"):
    so.append("visual")
if ctx.params.get("logo"):
    so.append("logo")
if ctx.params.get("text_in_video"):
    so.append("text_in_video")
if ctx.params.get("conversation"):
    so.append("conversation")

data = {
"engine_id": "marengo2.5",
"index_options": so,
"index_name": INDEX_NAME,
}

response = requests.post(INDEXES_URL, headers=headers, json=data)

Upload videos

The plugin invokes the POST method of the /tasksendpoint. Then, it monitors the indexing process using the GET method of the /tasks/{task_id} endpoint:

TASKS_URL = f"{API_URL}/tasks"

videos = target_view
for sample in videos:
    if sample.metadata.duration < 4:
        continue
    else:
        file_name = sample.filepath.split("/")[-1] 
        file_path = sample.filepath 
        file_stream = open(file_path,"rb")
    
        headers = {
            "x-api-key": API_KEY
        }
    
        data = {
            "index_id": INDEX_ID, 
            "language": "en"
        }
    
        file_param=[
            ("video_file", (file_name, file_stream, "application/octet-stream")),]
    
        response = requests.post(TASKS_URL, headers=headers, data=data, files=file_param)
        TASK_ID = response.json().get("_id")
        print (f"Status code: {response.status_code}")
        pprint (response.json())
    
        TASK_STATUS_URL = f"{API_URL}/tasks/{TASK_ID}"
        while True:
            response = requests.get(TASK_STATUS_URL, headers=headers)
            STATUS = response.json().get("status")
            if STATUS == "ready":
                break
            time.sleep(10)
        
        VIDEO_ID = response.json().get('video_id')
        sample["Twelve Labs " + INDEX_NAME] = VIDEO_ID
        sample.save()

Perform semantic searches

The plugin invokes the POST method of the /search endpoint to search across the sources of information that the user has selected:

SEARCH_URL = f"{API_URL}/search"

headers = {
"x-api-key": API_KEY
}

so = []

if ctx.params.get("visual"):
    so.append("visual")
if ctx.params.get("logo"):
    so.append("logo")
if ctx.params.get("text_in_video"):
    so.append("text_in_video")
if ctx.params.get("conversation"):
    so.append("conversation")

data = {
"query": prompt,
"index_id": INDEX_ID,
"search_options": so,
}

response = requests.post(SEARCH_URL, headers=headers, json=data)
video_ids = [entry['video_id'] for entry in response.json()['data']]
print(response.json())
samples = []
view1 = target_view.select_by("Twelve Labs " + INDEX_NAME, video_ids,ordered=True)
start = [entry['start'] for entry in response.json()['data']]
end = [entry['end'] for entry in response.json()['data']]
if "results" in ctx.dataset.get_field_schema().keys():
    ctx.dataset.delete_sample_field("results")

i=0
for sample in view1:
    support = [int(start[i]*sample.metadata.frame_rate)+1 ,int(end[i]*sample.metadata.frame_rate)+1]
    sample["results"] = fo.TemporalDetection(label=prompt, support=tuple(support))
    sample.save()

view2 = view1.to_clips("results")
ctx.trigger("set_view", {"view": view2._serialize()})

return {}

Next steps

After reading this page, you have several options:

  • Use the plugin as-is: Inspect the source code to better understand the platform's features and start using the plugin immediately.
  • Customize and enhance the plugin: Feel free to modify the code to meet your specific requirements.
  • Explore further: Try the applications built by the community or our sample applications to get more insights into the Twelve Labs Video Understanding Platform's diverse capabilities and learn more about integrating the platform into your applications.