Use the TwelveLabs Video Understanding Platform to find specific moments in your video content using natural language queries or reference images. The platform analyzes videos by integrating images, audio, speech, and text, offering a deeper understanding than single-modal methods. It captures complex relationships between these elements, detects subtle details, and supports natural language queries and images for intuitive and precise use.

Key features:

Improved accuracy: Multimodal integration enhances accuracy.
Easy interaction: Natural language queries simplify searches.
Advanced search: Enables image-based queries for precise results.
Fewer errors: Multi-faceted analysis reduces misinterpretation.
Time savings: Quickly finds relevant clips without manual review.

Use cases:

Spoken word search: Find video segments where specific words or phrases are spoken.
Visual element search: Locate video segments that match descriptions of visual elements or scenes.
Action or event search: Identify video segments that depict specific actions or events.
Image similarity search: Find video segments that visually resemble a provided image.
Entity search: Locate video segments containing specific people, car models, animal species, or branded objects with improved accuracy.

For details on how your usage is measured and billed, see the Pricing page.

Key concepts

This section explains the key concepts and terminology used in this guide:

Index: A container that organizes your video content
Asset: Your uploaded content. Once created, you can reference the same asset across multiple operations without uploading the file again.
Indexed asset: An asset that has been indexed and is ready for downstream tasks.

Workflow

Upload and index your videos before you search them. The platform indexes videos asynchronously. You can search your videos after indexing completes. Search results show video segments that match your search terms.

Types of search queries

The platform supports three types of search queries:

Text queries: Search using natural language descriptions of visual elements, actions, sounds, or spoken words
Image queries: Search using images to find visually similar content in your videos
Composed queries: Combine text descriptions with images for more precise results

For guidance on choosing the correct query type, see the Search with text, image, and composed queries page.

Search scope

You can search within a single index per request. You cannot search at the video level or across multiple indexes simultaneously.

Customize your search

You can customize your search in the following ways:

Specify which modalities to use: visual, audio, or transcription (spoken words)
Choose how to combine modalities: use the or or and operators
For searches within spoken words, select the match type: lexical, semantic, or both

Prerequisites

To use the platform, you need an API key:

1
If you don’t have an account, sign up for a free account.
2
Go to the API Keys page.
3
If you need to create a new key, select the Create API Key button. Enter a name and set the expiration period. The default is 12 months.
4
Select the Copy icon next to your key to copy it to your clipboard.
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
```
$ pip install --upgrade twelvelabs
```
Your video files must meet the following requirements:
- Upload limits: Public video URLs up to 4 GB or local video files up to 200 MB. For local files up to 4 GB, see the Upload and processing methods page.
- Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.
If you wish to use images as queries, ensure that your image file meet the requirements.

Complete example

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

Text queries

Image queries

Composed text and image queries

1 import time
2 from twelvelabs import TwelveLabs
3 
4 # 1. Initialize the client
5 client = TwelveLabs(api_key="<YOUR_API_KEY>")
6 
7 # 2. Create an index
8 # An index is a container for organizing your video content
9 index = client.indexes.create(
10     index_name="<YOUR_INDEX_NAME>",
11     models=[{"model_name": "marengo3.0", "model_options": ["visual", "audio"]}]
12 )
13 if not index.id:
14     raise RuntimeError("Failed to create an index.")
15 print(f"Created index: id={index.id}")
16 
17 # 3. Upload a video
18 asset = client.assets.create(
19     method="url",
20     url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
21     # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a local file up to 200 MB
22 )
23 print(f"Created asset: id={asset.id}")
24 
25 # 4. Check the status of the asset
26 print("Waiting for asset to be ready...")
27 while True:
28     asset = client.assets.retrieve(asset.id)
29     if asset.status == "ready":
30         print("Asset is ready")
31         break
32     if asset.status == "failed":
33         raise RuntimeError(f"Asset processing failed: id={asset.id}")
34     time.sleep(5)
35 
36 # 5. Index your video
37 indexed_asset = client.indexes.indexed_assets.create(
38     index_id=index.id,
39     asset_id=asset.id,
40     # enable_video_stream=True
41 )
42 print(f"Created indexed asset: id={indexed_asset.id}")
43 
44 # 6. Monitor the indexing process
45 print("Waiting for indexing to complete.")
46 while True:
47     indexed_asset = client.indexes.indexed_assets.retrieve(
48         index_id=index.id,
49         indexed_asset_id=indexed_asset.id
50     )
51     print(f"  Status={indexed_asset.status}")
52 
53     if indexed_asset.status == "ready":
54         print("Indexing complete!")
55         break
56     elif indexed_asset.status == "failed":
57         raise RuntimeError("Indexing failed")
58 
59     time.sleep(5)
60 
61 # 7. Perform a search request
62 search_results = client.search.query(
63     index_id=index.id,
64     query_text="<YOUR_QUERY>",
65     search_options=["visual", "audio"]
66     # operator="or" # Optional: Use "and" to find segments matching all modalities
67     # transcription_options=["lexical", "semantic"]  # Optional: Control transcription matching, requires "transcription" in search_options)
68 )
69 
70 # 8. Process the search results
71 print("\nSearch results:")
72 print("Each result shows a video clip that matches your query:\n")
73 for i, clip in enumerate(search_results):
74     print(f"Result {i + 1}:")
75     print(f"  Video ID: {clip.video_id}")  # Unique identifier of the video
76     print(f"  Rank: {clip.rank}")  # Relevance ranking (1 = most relevant)
77     print(f"  Time: {clip.start}s - {clip.end}s", end="\n\n")  # When this moment occurs in the video

Code explanation

Python

Node.js

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

Create an index

Indexes store and organize your video data, allowing you to group related videos. This guide shows how to create one, but you can also use an existing index.
Function call: You call the indexes.create function.
Parameters:

index_name: The name of the index.
models: An array specifying your model configuration. This example enables the Marengo video understanding model and specifies that it analyzes visual and audio modalities.

See the Indexes page for more details on creating an index and specifying the model configuration.

Return value: An object of type IndexesCreateResponse containing a field named id representing the unique identifier of the newly created index.

Upload a video

Upload a video to create an asset.
Function call: You call the assets.create function.
Parameters:

method: The upload method for your asset. Use url for a publicly accessible or direct to upload a local file. This example uses url.
url or file: The publicly accessible URL of your video or an opened file object in binary read mode. This example uses url.

Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.

Note

For local files larger than 200 MB, use multipart uploads. Multipart uploads support automatic retry, progress tracking, parallel chunk uploads, and improved reliability, performance, and observability.

Check the status of the asset

Asset processing is asynchronous. Poll the status of the asset until it is ready before you use it.
Function call: You call the assets.retrieve function.
Parameters:

asset_id: The unique identifier of your asset.

Return value: An object of type Asset containing, among other information, a field named status representing the current status of the asset. Check this field until its value is ready.

Index your video

Index your video by adding the asset created in the previous step to an index. This operation is asynchronous.
Function call: You call the indexes.indexed_assets.create function.
Parameters:

index_id: The unique identifier of the index to which the asset will be indexed.
asset_id: The unique identifier of your asset.
(Optional): enable_video_stream: Specifies whether the platform stores the video for streaming. When set to True, you can retrieve its URL by calling the indexes.indexed_assets.retrieve method.

Return value: An object of type IndexedAssetsCreateResponse. This object contains a field named id representing the unique identifier of your indexed asset.

Monitor the indexing process

The platform requires some time to index videos. Check the status of the indexing process until it’s completed.
Function call: You call the indexes.indexed_assets.retrieve function.
Parameters:

index_id: The unique identifier of your video index.
indexed_asset_id: The unique identifier of your indexed asset.

Return value: An object of type IndexedAssetDetailed containing, among other information, a field named status representing the status of the indexing process. Wait until the value of this field is ready.

Perform a search request

Perform a search within your index using a text or image query or a combination of both.

Text queries

Image queries

Composed text and image queries

Function call: You call the search.query method.
Parameters:

index_id: The unique identifier of the index.
query_text: Your search query. Note that the platform supports full natural language-based search. The maximum length for a query is 500 tokens.
search_options: The modalities the platform uses when performing a search. This example searches using visual and audio cues. For details, see the Search options section.
(Optional) operator: Combines multiple search options using or (default) or and. Use and to find segments matching all search options. Use or to find segments matching any search option.
(Optional) transcription_options: Specifies how the platform matches your query against spoken words. This parameter applies only when transcription is included in search_options. Available options are lexical, semantic, or both (default). For details, see the Transcription options section.

Return value: An object of type SyncPager[SearchItem] that can be iterated to access search results. Each item contains the following fields, among other information:

video_id: The unique identifier of the video that matched your search terms.
start: The start time of the matching video clip, expressed in seconds.
end: The end time of the matching video clip, expressed in seconds.
rank: The relevance ranking assigned by the model. Lower numbers indicate higher relevance, starting with 1 for the most relevant result.

Process the search results

This example iterates over the results using a for loop to display the search results to the standard output. Each result includes the video ID, relevance ranking, and the time range where the match occurs.

Next steps

Learn more about searching with text, image, and composed queries for best practices and advanced techniques.
Explore entity search to find specific people in your videos.
Learn query engineering techniques to refine your search queries.
Apply grouping to cluster search results from the same video together.
Implement filtering to narrow down results based on specific criteria.