Sinusoidal position encoding assigns unique frequencies to each dimension, enabling the model to distinguish positions effectively

Image: Dan Leveille (danlev on Wikimedia), CC BY-SA 3.0, via Wikimedia Commons

sinusoidal position encoding works: each dimension has a different frequency

Sinusoidal position encoding assigns unique frequencies to each dimension, enabling the model to distinguish positions effectively

Related concepts

rotary position embeddings (RoPE) do

RoPE encodes relative position by applying rotation matrices to input features

cosine similarity works better than Euclidean distance in high dimensions

Cosine similarity measures orientation, not magnitude, making it more robust to irrelevant dimensions in high-dimensional spaces

wavelets provide over Fourier: both time and frequency localization

Wavelets provide both time and frequency localization, unlike Fourier transforms which offer only frequency localization

ALiBi allows length extrapolation better than learned position embeddings

ALiBi uses relative positional encoding, avoiding fixed-size embeddings, enabling better handling of variable-length sequences

Principal component analysis

Eigenvectors point along maximum variance

transformers use LayerNorm not BatchNorm

LayerNorm normalizes across all features, accommodating variable-length sequences unlike BatchNorm, which relies on fixed-size batches

Swipe through 100 ML concepts daily