LASSO minimizes the cost function with L1 penalty, driving some coefficients to zero for feature selection

Image: Shailaja.k, CC BY-SA 3.0, via Wikimedia Commons

LASSO uses L1 to do feature selection by driving coefficients to exactly zero

LASSO minimizes the cost function with L1 penalty, driving some coefficients to zero for feature selection

Related concepts

Ridge regression uses L2 to shrink coefficients without eliminating them

Ridge regression minimizes the sum of squared residuals plus L2 penalty λ∑β²

L1 vs L2 regularization: L1 gives sparsity (feature selection), L2 gives small weights

L1 regularization: L1 = L2 + sparsity; L2 regularization: L2 = L1 + small weights

Proximal gradient methods for learning

Proximal gradient descent efficiently handles non-differentiable L1 regularization by combining gradient descent with a proximity operator

to standardize: when you need zero mean and unit variance for gradient-based optimization

Standardize when zero mean and unit variance are required for gradient-based optimization

LoRA (machine learning)

LoRA uses r << d for efficient adaptation

Convex optimization

Convex functions have only one global minimum

Swipe through 100 ML concepts daily