to normalize features: when features have different scales and you use distance-based methods

Normalize features when they have different scales for distance-based methods

Related concepts

L1 vs L2 regularization: L1 gives sparsity (feature selection), L2 gives small weights

L1 regularization: L1 = L2 + sparsity; L2 regularization: L2 = L1 + small weights

cosine similarity is preferred over dot product for normalized embeddings

Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings

to standardize: when you need zero mean and unit variance for gradient-based optimization

Standardize when zero mean and unit variance are required for gradient-based optimization

Batch norm vs layer norm: BN across batch, LN across features

Batch norm (BN) normalizes across batch, layer norm (LN) normalizes across features; LN handles variable-length sequences

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

List of algorithms

Cosine similarity measures the angle between vectors, not their magnitude

Swipe through 100 ML concepts daily