Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings
Image: cavebear42, CC BY-SA 4.0, via Wikimedia Commons
Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings
List of algorithms
Cosine similarity measures the angle between vectors, not their magnitude
cosine similarity works better than Euclidean distance in high dimensions
Cosine similarity measures orientation, not magnitude, making it more robust to irrelevant dimensions in high-dimensional spaces
ALiBi allows length extrapolation better than learned position embeddings
ALiBi uses relative positional encoding, avoiding fixed-size embeddings, enabling better handling of variable-length sequences
768-dim BERT embeddings capture: bidirectional context from masked language modeling
768-dim BERT embeddings capture bidirectional context from masked language modeling
soft targets carry more information than hard labels: they encode class similarities
Soft targets carry more information than hard labels because they encode class similarities
mean pooling often outperforms [CLS] for sentence similarity tasks
Mean pooling captures overall sentence meaning better than [CLS] token embedding
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews