RoPE encodes position by multiplying Q,K by R(θ_i) at each position
Image: Unidentified U.S. Army photographer, Public domain, via Wikimedia Commons
RoPE encodes position by multiplying Q,K by R(θ_i) at each position
rotary position embeddings (RoPE) do
RoPE encodes relative position by applying rotation matrices to input features
RoPE's advantage is: supports length extrapolation beyond training context length
RoPE (Relative Position Encoding) advantage: supports length extrapolation beyond training context length
sinusoidal position encoding works: each dimension has a different frequency
Sinusoidal position encoding assigns unique frequencies to each dimension, enabling the model to distinguish positions effectively
QR decomposition
QR decomposition factors A = QR, where Q is orthogonal, R is upper triangular
Matrix multiplication algorithm
Tiling divides matrices into smaller blocks, loading them into shared memory for efficient matrix multiplication
Hamming distance
Hamming distance measures the number of differing positions between two strings
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews