
L(x,λ) = f(x) - λ∫g(x)dx, where λ is Lagrange multiplier
L(x,λ) = f(x) - λ∫g(x)dx, where λ is Lagrange multiplier
What mixup does: trains on convex combinations of pairs: x̃=λx_i+(1-λ)x_j
Trains on convex combinations of pairs: x̃=λx_i+(1-λ)x_j represents a weighted average of two points in a convex set
Why proximal gradient descent is needed for L1 optimization
Proximal gradient descent handles non-differentiable L1 regularization, enabling sparse solutions
Write the formula for KL divergence D_KL(P||Q)
D_KL(P||Q) = Σ P(x) log(P(x)/Q(x)) for all x in the support of P
Write the formula for Mahalanobis distance
D^2 = (x - μ)^T Σ^(-1) (x - μ)
Write the equation for cross-entropy loss
H(y, p) = -Σ(y_i * log(p_i)) for all i
What maximum likelihood estimation does: find θ maximizing P(data|θ)
Maximizes θ to maximize the probability of observed data given θ
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews