Playing with Temperature and Top-p in Open AI's API
In the previous two articles, we explored the impact of Temperature and Top-p on large language model (LLM) output: The Role of Temperature and Top-p in LLM Accuracy and Creativity provided a practical overview, while Understanding Temperature and Top-p in Large Language Models - A Technical Deep Dive delved into the technical details.
In both articles, we used hypothetical examples to illustrate these concepts. However, certain APIs, like OpenAI’s ChatGPT, provide access to the actual token probabilities. This allows us to examine real-world examples and validate our understanding of these parameters.
OpenAI refers to these values as logprobs in their API, representing the logarithmic probabilities computed as log(softmax(logit)). This approach mitigates the risk of extremely small probabilities being rounded down to zero during computation.
Random tokens in a deterministic system #
Large Language Models are designed to be deterministic up to the token sampling stage. However, when interacting with the OpenAI API, you might notice that the probability values can vary between API calls, even with identical inputs.
I can’t tell for certain why that is. I wasn’t able to find any official explanation, or any conclusive unofficial explanations either. Perhaps it has to do with floating-point arithmetic differing between different hardware in Open AIs large data centers, or perhaps it’s some intentional change that Open AI added to keep their actual weights a secret.
The important thing to note is that the reported probability values can sometimes vary noticeably between API calls, and we should probably run multiple times and get an average if we’re interested in some value here.
Fetching probabilities #
To get the probabilities we have to set logprobs
to true, and set how many we want to see for each token with top_logprobs
(between 1 and 20). In the example below, we also use max_completion_tokens
to limit the response to a single token, although you can request probabilities for multiple tokens at once.
To convert the returned log probabilities (logprobs) back to standard probabilities, you need to compute the exponential function:
probability = e^(logprob)
.
import openai
import math
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "The quick brown fox jumps over the",
},
],
logprobs=True,
top_logprobs=10,
max_completion_tokens=1,
)
content = response.choices[0].logprobs.content[0]
print(f"Selected token: '{content.token}', Probability: {math.exp(content.logprob)}")
for e in content.top_logprobs:
print(f"Token: '{e.token}', Probability: {math.exp(e.logprob)}")
The API call above might yield a probability list similar to this:
Token | Probability |
---|---|
lazy | 0.92566 |
The | 0.05918 |
Sorry | 0.00801 |
I’m | 0.00179 |
Lazy | 0.00084 |
l | 0.00066 |
the | 0.00040 |
I | 0.00031 |
… | 0.00031 |
What | 0.00031 |
As expected, ’lazy’ emerges as the most probable next token.
Understanding Raw Probabilities #
It’s important to understand that these probabilities are derived directly from the model’s raw logits, prior to the application of Temperature or Top-p. Consequently, modifying the Temperature setting won’t alter these initial probabilities, and adjusting Top-p won’t restrict the number of tokens returned.
Applying Temperature and Top-p #
To apply the Temperature parameter to logprobs, we first divide each logprob by the Temperature value and then exponentiate the result:
new_probability = e^(logprob / temperature)
.
Top-p filtering involves selecting tokens in descending order of probability until their cumulative probability exceeds the Top-p threshold. Importantly, the token that pushes the cumulative probability over the Top-p value is included in the filtered set.
After applying Temperature scaling and Top-p filtering, we need to renormalize the probabilities to ensure they sum to 1. That is done by calculating the total sum of all probabilities and divide each probability by that value.
It will look something like this.
def print_temperature_top_p(logprobs, temperature=1.0, top_p=1.0):
temperature_probs = {}
# 1. Convert logprobs to probabilities and apply Temperature scaling
for obj in logprobs:
temperature_probs[obj.token] = math.exp(obj.logprob / temperature)
# 2. Normalize probabilities after Temperature scaling
total_prob = sum(temperature_probs.values())
temperature_probs = {k: v / total_prob for k, v in temperature_probs.items()}
# 3. Apply Top-p (nucleus) filtering
cumulative_prob = 0
top_p_filtered_probs = {}
for token, prob in temperature_probs.items():
if cumulative_prob <= top_p:
top_p_filtered_probs[token] = prob
else:
break
cumulative_prob += prob
# 4. Normalize probabilities of the tokens that passed the Top-p filter
total_prob = sum(top_p_filtered_probs.values())
top_p_filtered_probs = {k: v / total_prob for k, v in top_p_filtered_probs.items()}
# 5. Print the results
print(
"Token | Original Probability | After Temperature | Top-p Filtered Probability"
)
for obj in logprobs:
original_prob = math.exp(obj.logprob)
temp_prob = temperature_probs.get(obj.token, 0)
final_prob = top_p_filtered_probs.get(obj.token, 0)
print(
f"{obj.token:<20} | {original_prob:.4f} | {temp_prob:.4f} | {final_prob:.4f}"
)
You can use this function with the content.top_logprobs
obtained from the previous code example, like this:
print_temperature_top_p(content.top_logprobs, temperature=0.2, top_p=0.5)
print_temperature_top_p(content.top_logprobs, temperature=1.5, top_p=0.95)
One output for temperature=0.2 and top_p=0.5 is shown below:
Token | Probability | After Temperature | Cumulative P | Top-p Status | Final P |
---|---|---|---|---|---|
lazy | 0.92566 | 1.00000 | 1.00000 | ✓ | 1.00000 |
The | 0.05918 | 0.00000 | 1.00000 | × | 0.00000 |
Sorry | 0.00801 | 0.00000 | 1.00000 | × | 0.00000 |
I’m | 0.00179 | 0.00000 | 1.00000 | × | 0.00000 |
Lazy | 0.00084 | 0.00000 | 1.00000 | × | 0.00000 |
l | 0.00066 | 0.00000 | 1.00000 | × | 0.00000 |
the | 0.00040 | 0.00000 | 1.00000 | × | 0.00000 |
I | 0.00031 | 0.00000 | 1.00000 | × | 0.00000 |
… | 0.00031 | 0.00000 | 1.00000 | × | 0.00000 |
What | 0.00031 | 0.00000 | 1.00000 | × | 0.00000 |
As expected, with a low Temperature and Top-p, the model becomes highly deterministic, assigning virtually all probability mass to the ’lazy’ token.
And for temperature=1.5 and top_p=0.95:
Token | Probability | After Temperature | Cumulative P | Top-p Status | Final P |
---|---|---|---|---|---|
lazy | 0.92566 | 0.79676 | 0.79676 | ✓ | 0.83193 |
The | 0.05918 | 0.12739 | 0.92415 | ✓ | 0.13301 |
Sorry | 0.00801 | 0.03358 | 0.95773 | ✓ | 0.03506 |
I’m | 0.00179 | 0.01235 | 0.97008 | × | 0.00000 |
Lazy | 0.00084 | 0.00749 | 0.97757 | × | 0.00000 |
l | 0.00066 | 0.00634 | 0.98392 | × | 0.00000 |
the | 0.00040 | 0.00454 | 0.98846 | × | 0.00000 |
I | 0.00031 | 0.00385 | 0.99231 | × | 0.00000 |
… | 0.00031 | 0.00385 | 0.99615 | × | 0.00000 |
What | 0.00031 | 0.00385 | 1.00000 | × | 0.00000 |
With a higher Temperature and Top-p, the probability distribution becomes more diffuse, allowing for more diverse token selection.
Jupyter Notebook #
For further exploration, I’ve created a Jupyter Notebook that provides a starting point for experimenting with Temperature, Top-p, and OpenAI’s log probabilities. It includes the code examples used in this article.