Understanding Temperature and Top P in Large Language Models a Technical Deep Dive

This article delves into the technical mechanisms behind Temperature and Top-p, two key parameters that control the predictability and creativity of text generated by Large Language Models (LLMs). In our previous post, The Role of Temperature and Top-p in LLM Accuracy and Creativity, we explored how these settings influence the LLM’s output by affecting the token selection process. We saw that higher values produce more varied and creative text, while lower values yield more deterministic and predictable results, each suitable for different applications.

In this follow-up, we’ll examine how these values affect the token selection algorithm under the hood. While you can use LLMs through APIs without knowing the inner workings of Temperature and Top-p, understanding the details will give you a deeper intuition and let you fine-tune your applications for optimal results.

Logits and Softmax: From Raw Scores to Probabilities #

When the LLM generates the list of potential tokens, they give each token a raw value called a logit. This value can be either positive or negative, and its actual limits depends on the implementation of the LLM. It represents the model’s confidence in each token being the next word in the sequence.

Re-using the example from our previous article where the LLM was predicting the next token for the text ‘The quick brown fox jumps over the’. Using raw logits instead of the calculated probabilities might yield the following table.

Token	Logit
lazy	2.0000
quick	1.0000
tired	0.0000
slow	-1.0000
clumsy	-2.0000
awkward	-2.0000

Then, in the next step, the LLM uses an algorithm called Softmax to convert the logit into a probability score. The Softmax function works by raising ’e’ to the power of each logit (e^Logit) and then normalizes these exponentiated values so they sum to 1.

Softmax is crucial because it transforms the raw logits into a probability distribution, making it possible for the decoding strategy to sample the next token.

With a Temperature of 1.5 and a Top-p score of 0.75, the distribution might look like this.

Token	Logit	Probability (after Softmax)
lazy	2.0000	0.6291
quick	1.0000	0.2314
tired	0.0000	0.0851
slow	-1.0000	0.0313
clumsy	-2.0000	0.0115
awkward	-2.0000	0.0115

In this example, there’s about 63% chance that the next token would be lazy, and only ~1% chance that the next token would be clumsy or awkward.

Temperature: Scaling Logits to Control Randomness #

But before the LLM calculates the probabilities, it applies the selected Temperature. It’s applied by dividing the original logit by the Temperature.

Dividing by a value less than 1 (like 0.2) increases the difference between the logits, making larger logits even larger and smaller logits even smaller (more negative). And conversely, dividing by a value greater than 1 (like 1.5) decreases the magnitude of the logits, making the differences between them smaller.

Here’s the example from above with a really low Temperature of 0.2.

Token	Logit	Logit/Temperature	Probability
lazy	2.0000	10.0000	0.9933
quick	1.0000	5.0000	0.0067
tired	0.0000	0.0000	0.0000
slow	-1.0000	-5.0000	0.0000
clumsy	-2.0000	-10.0000	0.0000
awkward	-2.0000	-10.0000	0.0000

As we can see, that almost completely removes all but the most likely token from the selection. And when we have a large Temperature, we get less difference between the numbers, which turns into a more even probability distribution.

Token	Logit	Logit/Temperature	Probability
lazy	2.0000	1.3333	0.4875
quick	1.0000	0.6667	0.2503
tired	0.0000	0.0000	0.1285
slow	-1.0000	-0.6667	0.0660
clumsy	-2.0000	-1.3333	0.0339
awkward	-2.0000	-1.3333	0.0339

The default value for Temperature is 1, which would keep the original logits. Also note that while some services allow 0 for Temperature, setting Temperature to exactly 0 is mathematically problematic (division by zero). Most APIs will interpret 0 as a very low temperature for practical purposes, leading to highly deterministic output. However, using a very small value like 0.0001 can give you more predictable and consistent behavior across different systems

Top-p (Nucleus Sampling): Focusing on the Most Probable Tokens #

Top-p sampling, or nucleus sampling, is applied after the probabilities has been calculated with Softmax. It works by summing the most probable tokens, until the cumulative probability score is over the Top-p value. The probabilities of the selected tokens are then adjusted (renormalized) so that they again add up to 1. This ensures we still have a valid probability distribution for sampling.

So for the example above with a Temperature of 1.5, and a Top-p score of 0.75 we would get

Token	Probability	Cumulative P	Top-P Status	Adjusted P
lazy	0.4875	0.4875	✓	0.5627
quick	0.2503	0.7378	✓	0.2889
tired	0.1285	0.8663	✓	0.1483
slow	0.0660	0.9323	×	0.0000
clumsy	0.0339	0.9661	×	0.0000
awkward	0.0339	1.0000	×	0.0000

Only the first 3 tokens would be included in the filtered list. Those would then get a new probability score, so the total adds up to 1.

Token	Probability
lazy	0.5627
quick	0.2889
tired	0.1483

It’s important to note that Temperature and Top-p work in conjunction. Temperature modifies the initial probability distribution, and then Top-p sampling filters tokens from this modified distribution.

Putting it all Together: How Temperature and Top-p Control LLM Output #

To summarize how Temperature and Top-p control LLM output, here’s a step-by-step breakdown:

Generate Logits: The LLM first produces a list of potential next tokens, assigning each a raw score called a Logit. These logits represent the model’s initial prediction of token likelihood.
Apply Temperature: The Temperature value is applied by dividing each Logit. This adjusts the distribution:
- Lower Temperature (e.g., 0.2) sharpens the distribution, making high-probability tokens even more likely and low-probability tokens even less likely.
- Higher Temperature (e.g., 1.5) flattens the distribution, making token probabilities more even.
Convert to Probabilities (Softmax): The Softmax function then transforms the adjusted Logits into probabilities. These probabilities now sum up to 1 and represent the likelihood of each token being selected.
Apply Top-p Filtering: If Top-p sampling is used, the algorithm sorts the tokens by probability and selects the smallest set of most probable tokens whose cumulative probability exceeds the Top-p value. Tokens outside this “nucleus” are discarded.
Renormalize Probabilities: The probabilities of the selected tokens within the Top-p nucleus are then renormalized so they sum back up to 1.
Sample Next Token: Finally, the algorithm randomly samples the next token from the resulting probability distribution. Tokens with higher adjusted probabilities are more likely to be chosen, but randomness is still involved.

We can visualize this through a couple of examples. First, with a low Temperature (0.2) and a low Top-p (0.5).

Token	Logit	Logit/Temperature	Probability	Cumulative P	Top-P Status	Adjusted P
lazy	2.0000	10.0000	0.9933	0.9933	✓	1.0000
quick	1.0000	5.0000	0.0067	1.0000	×	0.0000
tired	0.0000	0.0000	0.0000	1.0000	×	0.0000
slow	-1.0000	-5.0000	0.0000	1.0000	×	0.0000
clumsy	-2.0000	-10.0000	0.0000	1.0000	×	0.0000
awkward	-2.0000	-10.0000	0.0000	1.0000	×	0.0000

As seen above, that combination will use lazy every time. Conversely, we would get a more random output with a high Temperature (1.5) and a high Top-p (0.95).

Token	Logit	Logit/Temperature	Probability	Cumulative P	Top-P Status	Adjusted P
lazy	2.0000	1.3333	0.4875	0.4875	✓	0.5229
quick	1.0000	0.6667	0.2503	0.7378	✓	0.2685
tired	0.0000	0.0000	0.1285	0.8663	✓	0.1378
slow	-1.0000	-0.6667	0.0660	0.9323	✓	0.0708
clumsy	-2.0000	-1.3333	0.0339	0.9661	×	0.0000
awkward	-2.0000	-1.3333	0.0339	1.0000	×	0.0000

In this example, lazy will only be selected ~50% of all outputs.

By experimenting with these parameters, developers and data scientists can tailor LLM-driven or generative AI applications — ranging from chatbots to content generation — so the resulting text is either precise and factual or highly creative, depending on the use case. Subtle adjustments to Temperature and Top-p often make a significant difference in user experience and output quality.

Also read the third and final part Playing with Temperature and Top-p in Open AI’s API where we look at actual probabilities from the Open AI API.