Softmax converts real numbers into a probability distribution
Image: Unknown authorUnknown author, CC BY 4.0, via Wikimedia Commons
Softmax converts real numbers into a probability distribution
The softmax function takes a vector of real numbers and applies the exponential function to each element, then normalizes these values by dividing by the sum of all exponentials. This transformation ensures that the output values are non-negative and sum up to one, making them a valid probability distribution. The softmax function is particularly useful in neural networks for tasks like classification, where it helps to convert the raw output scores into probabilities for each class.
Example
Given a vector [2, 1, 0.1], the softmax function will first compute the exponentials: exp(2) = 7.389, exp(1) = 2.718, exp(0.1) = 1.105. Then, it normalizes these values by dividing each by the sum of all exponentials: 7.389 + 2.718 + 1.105 = 11.212. The resulting softmax probabilities are approximately [0.655, 0.245, 0.010].
Understanding the softmax function is crucial for interpreting neural network outputs in classification tasks, as it provides a clear probability distribution over possible outcomes.
to write a fused softmax kernel in Triton: load row, compute max, subtract, exp, sum, divide
`output = exp(row - max_val) / sum(exp(row - max_val))`
Write the attention score formula before softmax: e_ij = a(s_i, h_j)
Attention score formula: e_ij = softmax(a(s_i, h_j))
temperature T in softmax(x/T) controls entropy: T→0 is argmax, T→∞ is uniform
As T approaches 0, softmax concentrates probabilities; as T approaches ∞, probabilities become uniform
self-attention: Attention(Q,K,V) = softmax(QK^T/√d_k)V
Attention(Q,K,V) = softmax(QK^T/√d_k)V
the Dirichlet distribution does: distribution over probability simplices
The Dirichlet distribution generates random probability vectors over a simplex
Normal distribution
Normal distribution PDF formula
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews