ALiBi uses fixed-length position encodings, enabling efficient length extrapolation without model retraining

Why ALiBi allows length extrapolation better than learned position embeddings

ALiBi uses fixed-length position encodings, enabling efficient length extrapolation without model retraining

Related concepts

What weight tying does in language models: shares embedding and output projection matrices

Language models use tied weights to share embedding and output projection matrices, enhancing parameter efficiency

Why the curse of dimensionality makes nearest neighbor search unreliable

High-dimensional spaces increase distance ambiguity, reducing nearest neighbor search reliability

What AWQ does differently — activation-aware weight quantization preserves important weights

AWQ quantizes weights while preserving critical activation values for neural network efficiency

What score matching does: learns the gradient of the log-density without normalizing

Score matching approximates log-density gradients for variational inference without normalization

Greedy vs beam search decoding: greedy picks best token, beam maintains k candidates

Greedy decoding selects one token, while beam search retains multiple candidates

What consistent hashing does: minimizes remapping when nodes join/leave

Consistent hashing minimizes data redistribution during nodes' addition or removal

Swipe through 100 ML concepts daily