As T approaches zero, softmax becomes argmax, maximizing entropy; T→∞ yields uniform distribution, minimizing entropy

Why temperature T in softmax(x/T) controls entropy: T→0 is argmax, T→∞ is uniform

As T approaches zero, softmax becomes argmax, maximizing entropy; T→∞ yields uniform distribution, minimizing entropy

Related concepts

Write the attention score formula before softmax: e_ij = a(s_i, h_j)

Attention score formula: e_ij = a(s_i, h_j) = exp(tanh(W_s * s_i + W_h * h_j + b))

What the second law of thermodynamics says — entropy of an isolated system never decreases

Entropy in an isolated system always increases or remains constant

Why SGD with momentum escapes local minima better than vanilla SGD

Momentum SGD accumulates velocity, helping to overcome shallow local minima

What AWQ does differently — activation-aware weight quantization preserves important weights

AWQ quantizes weights while preserving critical activation values for neural network efficiency

Write the equation for cross-entropy loss

H(y, p) = -Σ(y_i * log(p_i)) for all i

What Boltzmann's entropy formula states — S = k ln Ω, connecting microscopic states to macroscopic entropy

Boltzmann's formula relates entropy (S) to the natural logarithm of the number of microstates (Ω), with k as Boltzmann's constant

Swipe through 100 ML concepts daily