Ridge regression uses L2 to shrink coefficients without eliminating them

Ridge regression minimizes the sum of squared residuals plus L2 penalty λ∑β²

Related concepts

LASSO uses L1 to do feature selection by driving coefficients to exactly zero

LASSO minimizes the cost function with L1 penalty, driving some coefficients to zero for feature selection

L1 vs L2 regularization: L1 gives sparsity (feature selection), L2 gives small weights

L1 regularization: L1 = L2 + sparsity; L2 regularization: L2 = L1 + small weights

to standardize: when you need zero mean and unit variance for gradient-based optimization

Standardize when zero mean and unit variance are required for gradient-based optimization

Regularization (mathematics)

L1 regularization results in sparse solutions

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

Rate-distortion theory: minimum bits to represent data within distortion D

Rate-distortion theory: minimum bits to represent data within distortion D = R(D)

Swipe through 100 ML concepts daily