Skip to main content
Practice

The Concepts of Temperature and Top-p

As we saw earlier, every time a text-generating AI builds a sentence, it calculates several candidate tokens that could come next, then selects one and appends it token by token. The settings that control how boldly a candidate is selected are Temperature and Top-p.

Temperature and Top-p

The reason the tone of a response changes even when you ask the same question is usually due to this selection mechanism. The model has not changed. What has changed is how and which candidates are chosen.

Temperature: How Much Variety to Allow in the Next Token Selection

AI does not always pick only the "most probable candidate." It calculates multiple candidates probabilistically and selects one from among them. Temperature controls how much variety is allowed in that selection.

Consider this question as an example:

Question for Explaining Temperature
Describe autumn in one sentence.

AI internally calculates several candidates:

  • Autumn is a season of cool breezes.
  • Autumn is a time when leaves turn vivid shades of red.
  • Autumn is a transition period between summer and winter.
  • Autumn is a season when solitude deepens.

With a low Temperature, the most common and textbook-style expression is most likely to be selected:

Autumn is a season that begins with cool winds.

As Temperature increases, expressions that would otherwise be selected less often may also appear:

Autumn is the time when sunlight slowly cools.

An important point here is that Temperature is not a "creativity switch." The candidates the model has calculated already exist. Temperature adjusts how much the differences between those candidates are emphasized. Low Temperature locks the selection almost entirely to the top candidate; high Temperature allows lower-ranking candidates to be chosen as well.

Setting Temperature too high can cause the logical flow to waver or sentences to become unstable. Conversely, setting it too low makes responses overly repetitive and formulaic.

Top-p: How Far to Extend the Pool of Candidate Tokens

Unlike Temperature, Top-p adjusts the range of candidates considered in the first place. AI internally calculates many candidates, but some have very low probability.

Top-p retains only the top candidates until their cumulative probability reaches a certain threshold, then excludes the rest entirely. Simply put, it cuts out options that are too unlikely before they are even considered.

For example, if the candidates following "Spring is" are calculated as:

  • A season when flowers bloom.
  • A time when new shoots sprout.
  • A great time to travel.
  • A period when the stock market becomes unstable.

The first three are common expressions, but the last may have a low contextual probability. Setting Top-p low excludes such low-probability candidates from consideration. Setting Top-p high leaves more varied expressions in the candidate pool.

In summary, Temperature determines how boldly to select from the remaining candidates, while Top-p determines which candidates to retain in the first place.

What Happens When Both Settings Work Together?

In practice, Temperature and Top-p are applied simultaneously.

  • Low Temperature + Low Top-p → stable, predictable responses
  • Low Temperature + High Top-p → relatively stable but with slightly more varied expression
  • High Temperature + High Top-p → creative but potentially inconsistent

For example, tasks where accuracy is important, such as report summarization, legal document drafting, or code generation, suit lower Temperature. Tasks that call for diverse expression, such as story writing, ad copywriting, or idea brainstorming, benefit from somewhat higher settings.

Why Does the Same Question Get Different Answers?

The reason responses vary slightly when you enter the same question multiple times is that AI always calculates several candidates and selects one from among them. Temperature and Top-p control the breadth and scope of that selection.

AI is not a system that stores a correct answer and retrieves it. It calculates and selects at every moment. How conservative or how open to variety that selection is depends entirely on these two settings.

Understanding Temperature and Top-p explains why AI responses sometimes look textbook-like and sometimes feel creative. It also gives you a way to tune the character of a response to fit your purpose.