As T approaches 0, softmax concentrates probabilities; as T approaches ∞, probabilities become uniform

Image: Trondheim Havn from Trondheim, Norway, CC BY-SA 2.0, via Wikimedia Commons

temperature T in softmax(x/T) controls entropy: T→0 is argmax, T→∞ is uniform

As T approaches 0, softmax concentrates probabilities; as T approaches ∞, probabilities become uniform

Related concepts

Entropy H = -Σ p(x) log₂ p(x) measures average surprise in bits

Entropy H = -Σ p(x) log₂ p(x) quantifies uncertainty in a system

Shannon's source coding theorem: you can't compress below entropy

Shannon's theorem: Data compression can't exceed entropy limit

Cross-entropy H(p,q) = -Σ p(x) log q(x) measures how well q approximates p

Cross-entropy H(p,q) = -Σ p(x) log q(x) quantifies approximation quality between distributions p and q

Softmax function

Softmax converts real numbers into a probability distribution

List of unsolved problems in mathematics

Random points in high dimensions are nearly equidistant due to the uniform distribution of volume in high-dimensional space

Entropy (information theory)

H(X) = −∑x∈X p(x) log(p(x))

Swipe through 100 ML concepts daily