Streaming
Receive response tokens in real time via Server-Sent Events instead of waiting for the complete response. Use streaming when you want tokens to appear as they are generated, show progress for long responses, or reduce perceived latency in a chat-like interface. This guide covers how to enable streaming, parse the event stream, and handle the connection lifecycle.
Key concepts
- Server-Sent Events (SSE): A standard for streaming data over HTTP. The server sends a sequence of
data:lines, each containing a JSON event. The stream ends with adata: [DONE]signal. - Streaming mode: When
streamistrue, the API returns an SSE stream instead of a single JSON response. You must also enable streaming on the HTTP client side.
Prerequisites
- You’ve already uploaded your content, and the asset has reached the
readystatus. See the Upload content page for details. - You’ve already created a knowledge store. See the Create a knowledge store page for details.
- You’ve already added at least one asset to the knowledge store, and the item has reached the
readystatus. See the Add assets page for details. - You’ve already read the Create a response page and understand the basic request and response format.
Enable streaming
To stream a response, set stream to true in the request body. You must also pass stream=True to the requests.post() call so the HTTP client reads the response incrementally rather than buffering the entire body.
Parse the event stream
The response is a sequence of data: lines. Each line contains a JSON object representing one event. The final line is data: [DONE], which signals the end of the stream.
A typical event looks like this:
Events arrive incrementally - each delta contains a fragment of the generated text. Concatenate the deltas to build the full response.
Combine with other features
Streaming works with instructions, structured output, and multi-turn sessions. Set stream to true alongside any other parameters:
Common pitfalls
- Set
stream=Truein both places. The JSON body tells the API to stream; therequests.post()parameter tells the HTTP client to read incrementally. Missing either one breaks streaming. - Handle the
[DONE]signal. It marks the end of the stream. Without it, your code may hang waiting for more data. - No automatic reconnection. If the connection drops, start a new request. SSE reconnection is not built in for this endpoint.
Next steps
- Structured output - get typed JSON back by providing a schema
- Multi-turn sessions - maintain conversation context across requests
- Create a response - review the basic request and response format
Jupyter notebook
Download the notebook to run this guide interactively.