
Perplexity = 2^H
Image: Guillaume Jacquenot, CC BY-SA 4.0, via Wikimedia Commons
Perplexity = 2^H
Perplexity measures uncertainty in a probability distribution. It is calculated as 2 raised to the power of the information entropy (H). Perplexity quantifies how well a probability distribution predicts outcomes.
Example
For a fair coin (N=2 outcomes), H = log2(1/0.5) = 1. Perplexity = 2^1 = 2.
Understanding perplexity helps in evaluating models' performance in predicting outcomes, crucial for applications like speech recognition.
Entropy (information theory)
H(X) = −∑x∈X p(x) log(p(x))
Cross-entropy
Cross-entropy loss equation: H(p, q) = -Σ(p(x) * log(q(x)))
Cross-entropy H(p,q) = -Σ p(x) log q(x) measures how well q approximates p
Cross-entropy H(p,q) = -Σ p(x) log q(x) quantifies approximation quality between distributions p and q
Entropy H = -Σ p(x) log₂ p(x) measures average surprise in bits
Entropy H = -Σ p(x) log₂ p(x) quantifies uncertainty in a system
Jensen–Shannon divergence
Jensen-Shannon divergence formula: D_JS(P||Q) = 1/2 * D_KL(P||(M)) + 1/2 * D_KL(Q||(M))
A fair die has entropy of log₂(6) ≈ 2.58 bits
A fair die's entropy: log₂(6) ≈ 2.58 bits
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews