The Role of Temperature and Top-p in LLM Accuracy and Creativity

To finely control the output of large language models (LLMs), developers often utilize two key parameters: Temperature and Top-p. These parameters allow us to fine-tune the desired level of predictability and creativity of an LLM’s responses, making them suitable for a wider range of applications.

Lower values are great for accuracy-focused tasks like code generation or information retrieval, ensuring more predictable results. In contrast, higher values inject randomness, sparking creativity in tasks like copywriting, story generation, or brainstorming.

How LLMs Predict the Next Word #

At their core, LLMs operate by predicting the next word in a sequence. They do this by generating a list of potential tokens, words or parts of words, that could logically follow the current text. Each token receives a probability score, reflecting its likelihood of appearing next based on the LLM’s training data. The model then uses a decoding strategy, a specific algorithm, to select the next token from this ranked list. This is where Temperature and Top-p come into play, guiding the decoding strategy in selecting the next token.

Consider this example: the LLM is trying to complete the sentence ‘The quick brown fox jumps over the’. Below is a potential set of next tokens and their probabilities.

Token	Probability
lazy	0.6291
quick	0.2314
tired	0.0851
slow	0.0313
clumsy	0.0115
awkward	0.0115

In this instance, lazy holds the highest probability, making it the most likely choice, while awkward has a near-zero probability.

Temperature: Controlling Randomness in Output #

Temperature controls the randomness of the output by adjusting the probability distribution of tokens. Lower values increase the probability of the most likely tokens, leading to more predictable outputs, while higher values distribute the probability more evenly, introducing more variability and surprise.

For instance, with a low Temperature value like 0.2, the probability distribution might shift to something like this:

Token	Probability
lazy	0.9933
quick	0.0067
tired	0.0000
slow	0.0000
clumsy	0.0000
awkward	0.0000

Observe how, at this low Temperature, the probability overwhelmingly concentrates on lazy, virtually eliminating other options, while with a higher Temperature of 1.5, the probabilities might change to something like this:

Token	Probability
lazy	0.4875
quick	0.2503
tired	0.1285
slow	0.0660
clumsy	0.0339
awkward	0.0339

As you can see, the higher Temperature flattens the probability distribution, making the choice of the next token less obvious.

Applying Temperature for Different Tasks #

For creative tasks like writing stories or brainstorming ideas, a higher Temperature is your friend. It encourages the LLM to occasionally pick less common tokens, potentially leading your text in unexpected and interesting directions. Consider the sentence “The horse’s mane was”. With a higher temperature, the model might offer a wider range of possibilities:

Token	Probability
long	0.60
flowing	0.30
silky	0.05
gossamer	0.03
braided	0.01
unruly	0.005

If ’long’ is chosen, the sentence might become “The horse’s mane was long, flowing gracefully in the wind” — a standard description. However, if ‘unruly’ is selected, the sentence could transform into “The horse’s mane was unruly, a tangled, rebellious mass of dark, thick hair” — a more striking and unusual description.

However, for tasks demanding accuracy and predictability, such as generating technical documentation or code, a lower Temperature is generally preferred. This encourages the LLM to stick to the most probable and straightforward tokens. For example, this list for the snippet “public void greet”:

Token	Probability
User	0.85
Customer	0.05
Client	0.04
Visitor	0.03
Patron	0.02
Individual	0.01

Where public void greetUser(String name) is preferable to public void greetIndividual(String name) in most case.

In essence, adjust the Temperature based on your goal: higher for exploration and creativity, lower for precision and accuracy.

Top-p: Focusing on the Most Likely Tokens #

Top-p, also known as nucleus sampling, limits token selection to the most probable options, based on their combined probabilities. So, with a Top-p value of 1, the probability list looks exactly the same.

Token	Probability
lazy	0.6291
quick	0.2314
tired	0.0851
slow	0.0313
clumsy	0.0115
awkward	0.0115

But putting it to 0.9 would limit it to only the top choices:

Token	Probability	New Probability
lazy	0.6291	0.6652
quick	0.2314	0.2447
tired	0.0851	0.0900

_{Note: After the token list is truncated based on the Top-p value, the probabilities of the remaining tokens are re-normalized to ensure they still sum to 1, as seen in the third column.}

Temperature adjusts the likelihood of all tokens, making low-probability tokens more or less likely. In contrast, Top-p acts as a filter, entirely removing less probable tokens from consideration. Therefore, combining a lower Top-p value with a higher Temperature can lead to output that is more varied than with a low Temperature alone, but still generally stays within the realm of the more probable tokens.

Conclusion #

To summarize, Temperature and Top-p are key parameters that determine the balance between randomness and determinism in LLM text generation. For accuracy-driven tasks, lower values are preferred; for creative endeavors, higher values are ideal.

Keep in mind that the examples presented here are somewhat exaggerated to clearly illustrate the effects of these parameters. In practice, the differences between subtle adjustments might be less dramatic, especially in shorter responses.

Be sure to check out Understanding Temperature and Top P in Large Language Models a Technical Deep Dive if you’re interested in the technical explanation behind Temperature and Top-p. And Playing with Temperature and Top-p in Open AI’s API where we look at actual probabilities from the Open AI API.