L1 regularization promotes sparsity by penalizing non-zero coefficients, effectively driving some to zero

Why L1 regularization produces sparse solutions — the diamond corners touch axes

L1 regularization promotes sparsity by penalizing non-zero coefficients, effectively driving some to zero

Related concepts

Why proximal gradient descent is needed for L1 optimization

Proximal gradient descent handles non-differentiable L1 regularization, enabling sparse solutions

Why the L1 unit ball is a diamond shape and the L2 unit ball is a circle

L1 norm: Manhattan distance, L2 norm: Euclidean distance

Why non-convex loss landscapes are hard: many local minima and saddle points

Non-convex landscapes have numerous local minima and saddle points, complicating optimization

How does the choice of norm affect the shape of the unit ball in a given vector space, specifically comparing the properties of L1 and L∞ norms?

L1 norms create diamond-shaped unit balls, while L∞ norms yield cube-shaped unit balls

Why SGD with momentum escapes local minima better than vanilla SGD

Momentum SGD accumulates velocity, helping to overcome shallow local minima

How does the curse of dimensionality affect the performance and accuracy of clustering algorithms in high-dimensional datasets?

High-dimensional data can lead to sparse clusters, reducing clustering accuracy due to increased distance between points

Swipe through 100 ML concepts daily