Extract text recognized in video (OCR)

At a high level, extracting text recognized in a video (OCR) involves the following steps:

  • Upload and index a video. When you upload a video by calling the POST method of the /tasks endpoint, the platform creates a video indexing task and returns its unique identifier.
  • Retrieve the unique identifier of your video. Once the platform finishes indexing your video you can retrieve the unique identifier of the video by calling the GET method of the/tasks/{task-id} endpoint and passing it the unique identifier of your video indexing task.
  • Extract the text that appears in your video. Use the GET method of the /indexes/{index-id}/videos/{video-id}/text-in-video endpoint passing it the unique identifiers of your index and video. For a description of each field in the request and response, see the API Reference > Retrieve text recognized in a video (OCR) page.


  • You’re familiar with the concepts that are described on the Platform overview page.
  • You've uploaded a video, and the platform has finished indexing it. The unique identifier of your video is stored in a variable named VIDEO_ID. For details, see the Upload videos page.


  1. Construct the URL for retrieving the text recognized in your video based on the INDEX_ID and VIDEO_ID variables:

    TEXT_IN_VIDEO_URL = f"{API_URL}/indexes/{INDEX_ID}/videos/{VIDEO_ID}/text-in-video"
    const TEXT_IN_VIDEO_URL = `${API_URL}/indexes/${INDEX_ID}/videos/${VIDEO_ID}/text-in-video`
  2. Retrieve the text and print it out to the console:

    response = requests.get(TEXT_IN_VIDEO_URL, headers=headers)
    print (f"Status code: {response.status_code}")
    pprint (response.json())
    const config = {
      method: 'get',
      url: TEXT_IN_VIDEO_URL, 
      headers: headers,
    const resp = await axios(config)
    const response = await resp.data
    console.log(`Status code: ${resp.status}`)

    The following example output was truncated for brevity:

    Status code: 200
      "_id": "62aeb771154f59c87660ce91",
      "data": [
        { "end": 77, "start": 76, "value": "DID YOU GET THAT ON VIDEO?" },
        { "end": 79, "start": 78, "value": "YEAH" },
        { "end": 102, "start": 101, "value": "ALRIGHTY HERE WE GO" },
        { "end": 295, "start": 294, "value": "GOT LIKE A BIG STICK OR SOMETHING" }
      "index_id": "629deb409ea24f052b971993"

    Note that you can use the start and end query parameters to specify the time range for which you want to retrieve the text. The following example URL retrieves the text for the first 10 seconds of the video:

    TEXT_IN_VIDEO_URL = f"{API_URL}/indexes/{INDEX_ID}/videos/{VIDEO_ID}/text-in-video?start=0&end=10"
    const TEXT_IN_VIDEO_URL = `${API_URL}/indexes/${INDEX_ID}/videos/${VIDEO_ID}/text-in-video?start=0&end=10`