![label smoothing does: replaces one-hot [0,0,1,0] with [0.025, 0.025, 0.925, 0.025]](https://upload.wikimedia.org/wikipedia/commons/e/ee/Neurolinguistics.png)
Label smoothing replaces hard labels with soft labels to regularize neural networks
Image: KarinaCor, CC BY-SA 4.0, via Wikimedia Commons
Label smoothing replaces hard labels with soft labels to regularize neural networks
soft targets carry more information than hard labels: they encode class similarities
Soft targets carry more information than hard labels because they encode class similarities
the over-smoothing problem is in GNNs: deep GNNs make all node features converge
Over-smoothing in GNNs: Deeper layers cause node features to converge too much, losing unique node identities
Batch norm vs layer norm: BN across batch, LN across features
Batch norm (BN) normalizes across batch, layer norm (LN) normalizes across features; LN handles variable-length sequences
the reverse process learns: p_θ(x_{t-1}|x_t)
The reverse process learns: p_θ(x_{t-1}|x_t) — denoising one step at a time
batch size affects generalization: larger batches find sharper minima
Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates
L1 vs L2 regularization: L1 gives sparsity (feature selection), L2 gives small weights
L1 regularization: L1 = L2 + sparsity; L2 regularization: L2 = L1 + small weights
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews