768-dim BERT embeddings capture bidirectional context from masked language modeling

Image: U.S. Embassy, Jakarta from Jakarta, Indonesia, Public domain, via Wikimedia Commons

768-dim BERT embeddings capture: bidirectional context from masked language modeling

768-dim BERT embeddings capture bidirectional context from masked language modeling

Related concepts

weight tying does in language models: shares embedding and output projection matrices

Tying reduces the number of parameters by sharing embedding and output projection matrices

cosine similarity is preferred over dot product for normalized embeddings

Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings

384-dim all-MiniLM-L6-v2 optimizes: fast sentence similarity with 6 layers

All-MiniLM-L6-v2 optimizes fast sentence similarity with 6 layers

ALiBi allows length extrapolation better than learned position embeddings

ALiBi uses relative positional encoding, avoiding fixed-size embeddings, enabling better handling of variable-length sequences

1536-dim OpenAI text-embedding-3-large is used for: semantic search and RAG

Used for semantic search, RAG, and enhancing language models' understanding

mean pooling often outperforms [CLS] for sentence similarity tasks

Mean pooling captures overall sentence meaning better than [CLS] token embedding

Swipe through 100 ML concepts daily