As T approaches zero, softmax becomes argmax, maximizing entropy; T→∞ yields uniform distribution, minimizing entropy
As T approaches zero, softmax becomes argmax, maximizing entropy; T→∞ yields uniform distribution, minimizing entropy
Write the attention score formula before softmax: e_ij = a(s_i, h_j)
Attention score formula: e_ij = a(s_i, h_j) = exp(tanh(W_s * s_i + W_h * h_j + b))
What the second law of thermodynamics says — entropy of an isolated system never decreases
Entropy in an isolated system always increases or remains constant
Why SGD with momentum escapes local minima better than vanilla SGD
Momentum SGD accumulates velocity, helping to overcome shallow local minima
What AWQ does differently — activation-aware weight quantization preserves important weights
AWQ quantizes weights while preserving critical activation values for neural network efficiency
Write the equation for cross-entropy loss
H(y, p) = -Σ(y_i * log(p_i)) for all i
What Boltzmann's entropy formula states — S = k ln Ω, connecting microscopic states to macroscopic entropy
Boltzmann's formula relates entropy (S) to the natural logarithm of the number of microstates (Ω), with k as Boltzmann's constant
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews