Soft targets carry more information than hard labels because they encode class similarities

Image: Unknown authorUnknown author, Public domain, via Wikimedia Commons

soft targets carry more information than hard labels: they encode class similarities

Soft targets carry more information than hard labels because they encode class similarities

Related concepts

cosine similarity is preferred over dot product for normalized embeddings

Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings

Masking (behavior)

Causal masking prevents attention to future tokens in the decoder

mean pooling often outperforms [CLS] for sentence similarity tasks

Mean pooling captures overall sentence meaning better than [CLS] token embedding

the vocabulary size matters: larger vocab = shorter sequences but more parameters

Larger vocab reduces sequence length, increasing model complexity and parameters

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

autoencoders learn the data manifold

Autoencoders compress data manifold by forcing information through a bottleneck layer, learning efficient representations

Swipe through 100 ML concepts daily