
LASSO minimizes the cost function with L1 penalty, driving some coefficients to zero for feature selection
Image: Shailaja.k, CC BY-SA 3.0, via Wikimedia Commons
LASSO minimizes the cost function with L1 penalty, driving some coefficients to zero for feature selection
Ridge regression uses L2 to shrink coefficients without eliminating them
Ridge regression minimizes the sum of squared residuals plus L2 penalty λ∑β²
L1 vs L2 regularization: L1 gives sparsity (feature selection), L2 gives small weights
L1 regularization: L1 = L2 + sparsity; L2 regularization: L2 = L1 + small weights
Proximal gradient methods for learning
Proximal gradient descent efficiently handles non-differentiable L1 regularization by combining gradient descent with a proximity operator
to standardize: when you need zero mean and unit variance for gradient-based optimization
Standardize when zero mean and unit variance are required for gradient-based optimization
LoRA (machine learning)
LoRA uses r << d for efficient adaptation
Convex optimization
Convex functions have only one global minimum
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews