Label smoothing replaces hard labels with soft labels to regularize neural networks

Image: KarinaCor, CC BY-SA 4.0, via Wikimedia Commons

label smoothing does: replaces one-hot [0,0,1,0] with [0.025, 0.025, 0.925, 0.025]

Label smoothing replaces hard labels with soft labels to regularize neural networks

Related concepts

soft targets carry more information than hard labels: they encode class similarities

Soft targets carry more information than hard labels because they encode class similarities

the over-smoothing problem is in GNNs: deep GNNs make all node features converge

Over-smoothing in GNNs: Deeper layers cause node features to converge too much, losing unique node identities

Batch norm vs layer norm: BN across batch, LN across features

Batch norm (BN) normalizes across batch, layer norm (LN) normalizes across features; LN handles variable-length sequences

the reverse process learns: p_θ(x_{t-1}|x_t)

The reverse process learns: p_θ(x_{t-1}|x_t) — denoising one step at a time

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

L1 vs L2 regularization: L1 gives sparsity (feature selection), L2 gives small weights

L1 regularization: L1 = L2 + sparsity; L2 regularization: L2 = L1 + small weights

Swipe through 100 ML concepts daily