# How to stream completions

By default, when you send a prompt to the OpenAI Completions endpoint, it computes the entire completion and sends it back in a single response.

If you're generating very long completions from a davinci-level model, waiting for the response can take many seconds. As of Aug 2022, responses from `text-davinci-002` typically take something like ~1 second plus ~2 seconds per 100 completion tokens.

If you want to get the response faster, you can 'stream' the completion as it's being generated. This allows you to start printing or otherwise processing the beginning of the completion before the entire completion is finished.

To stream completions, set `stream=True` when calling the Completions endpoint. This will return an object that streams back text as [data-only server-sent events](https://app.mode.com/openai/reports/4fce5ba22b5b/runs/f518a0be4495).

## Downsides

Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).

Another small drawback of streaming responses is that the response no longer includes the `usage` field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using [`tiktoken`](How_to_count_tokens_with_tiktoken.ipynb).

## Example code

Below is a Python code example of how to receive streaming completions.

In [1]:
# imports
import openai  # for OpenAI API calls
import time  # for measuring time savings

### A typical completion request

With a typical Completions API call, the text is first computed and then returned all at once.

In [2]:
# Example of an OpenAI Completion request
# https://beta.openai.com/docs/api-reference/completions/create

# record the time before the request is sent
start_time = time.time()

# send a Completion request to count to 100
response = openai.Completion.create(
    model='text-davinci-002',
    prompt='1,2,3,',
    max_tokens=193,
    temperature=0,
)

# calculate the time it took to receive the response
response_time = time.time() - start_time

# extract the text from the response
completion_text = response['choices'][0]['text']

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full text received: {completion_text}")

Full response received 7.32 seconds after request
Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100


### A streaming completion request

With a streaming Completions API call, the text is sent back via a series of events. In Python, you can iterate over these events with a `for` loop.

In [3]:
# Example of an OpenAI Completion request, using the stream=True option
# https://beta.openai.com/docs/api-reference/completions/create

# record the time before the request is sent
start_time = time.time()

# send a Completion request to count to 100
response = openai.Completion.create(
    model='text-davinci-002',
    prompt='1,2,3,',
    max_tokens=193,
    temperature=0,
    stream=True,  # this time, we set stream=True
)

# create variables to collect the stream of events
collected_events = []
completion_text = ''
# iterate through the stream of events
for event in response:
    event_time = time.time() - start_time  # calculate the time delay of the event
    collected_events.append(event)  # save the event response
    event_text = event['choices'][0]['text']  # extract the text
    completion_text += event_text  # append the text
    print(f"Text received: {event_text} ({event_time:.2f} seconds after request)")  # print the delay and text

# print the time delay and text received
print(f"Full response received {event_time:.2f} seconds after request")
print(f"Full text received: {completion_text}")

Text received: 4 (0.16 seconds after request)
Text received: , (0.19 seconds after request)
Text received: 5 (0.21 seconds after request)
Text received: , (0.24 seconds after request)
Text received: 6 (0.27 seconds after request)
Text received: , (0.29 seconds after request)
Text received: 7 (0.32 seconds after request)
Text received: , (0.35 seconds after request)
Text received: 8 (0.37 seconds after request)
Text received: , (0.40 seconds after request)
Text received: 9 (0.43 seconds after request)
Text received: , (0.46 seconds after request)
Text received: 10 (0.48 seconds after request)
Text received: , (0.51 seconds after request)
Text received: 11 (0.54 seconds after request)
Text received: , (0.56 seconds after request)
Text received: 12 (0.59 seconds after request)
Text received: , (0.62 seconds after request)
Text received: 13 (0.64 seconds after request)
Text received: , (0.67 seconds after request)
Text received: 14 (0.70 seconds after request)
Text received: , (0.72 second

### Time comparison

In the example above, both requests took about 7 seconds to fully complete.

However, with the streaming request, you would have received the first token after 0.16 seconds, and subsequent tokens after about ~0.035 seconds each.