Matryoshka embeddings: Trained to be useful at multiple truncated dimensions

Image: Vassily Kandinsky by Adolf Elnain Photo credits : Georges Meguerditchian - Centre Pompidou, MNAM-CCI /Dist. RMN-GP Imag, Public domain, via Wikimedia Commons

Matryoshka embeddings are: trained to be useful at multiple truncated dimensions

Matryoshka embeddings: Trained to be useful at multiple truncated dimensions

Ask Claude to explain

Related concepts

weight tying does in language models: shares embedding and output projection matrices

Tying reduces the number of parameters by sharing embedding and output projection matrices

random projection to O(log n/ε²) dimensions preserves pairwise distances within 1±ε

Random projection reduces dimensionality while preserving pairwise distances within ε² due to the Johnson-Lindenstrauss lemma

ALiBi allows length extrapolation better than learned position embeddings

ALiBi uses relative positional encoding, avoiding fixed-size embeddings, enabling better handling of variable-length sequences

the Johnson-Lindenstrauss lemma says

Random projection reduces dimensionality while approximately preserving pairwise distances

cosine similarity is preferred over dot product for normalized embeddings

Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews