Sinusoidal position encoding assigns unique frequencies to each dimension, enabling the model to distinguish positions effectively
Image: Dan Leveille (danlev on Wikimedia), CC BY-SA 3.0, via Wikimedia Commons
Sinusoidal position encoding assigns unique frequencies to each dimension, enabling the model to distinguish positions effectively
rotary position embeddings (RoPE) do
RoPE encodes relative position by applying rotation matrices to input features
cosine similarity works better than Euclidean distance in high dimensions
Cosine similarity measures orientation, not magnitude, making it more robust to irrelevant dimensions in high-dimensional spaces
wavelets provide over Fourier: both time and frequency localization
Wavelets provide both time and frequency localization, unlike Fourier transforms which offer only frequency localization
ALiBi allows length extrapolation better than learned position embeddings
ALiBi uses relative positional encoding, avoiding fixed-size embeddings, enabling better handling of variable-length sequences
Principal component analysis
Eigenvectors point along maximum variance
transformers use LayerNorm not BatchNorm
LayerNorm normalizes across all features, accommodating variable-length sequences unlike BatchNorm, which relies on fixed-size batches
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews