![mean pooling often outperforms [CLS] for sentence similarity tasks](https://upload.wikimedia.org/wikipedia/commons/e/e0/Iran_econ.jpg)
Mean pooling captures overall sentence meaning better than [CLS] token embedding
Image: These maps and charts are scanned from "Atlas of the Middle East", published in January 1993 by the U.S. Central Intelli, Public domain, via Wikimedia Commons
Mean pooling captures overall sentence meaning better than [CLS] token embedding
[CLS] pooling does: uses the first token's embedding as the sentence representation
CLS pooling: uses the first token's embedding as the sentence representation
mean pooling does: averages all token embeddings to get a sentence embedding
Mean pooling: averages all token embeddings to create a sentence embedding
weight tying does in language models: shares embedding and output projection matrices
Tying reduces the number of parameters by sharing embedding and output projection matrices
384-dim all-MiniLM-L6-v2 optimizes: fast sentence similarity with 6 layers
All-MiniLM-L6-v2 optimizes fast sentence similarity with 6 layers
1536-dim OpenAI text-embedding-3-large is used for: semantic search and RAG
Used for semantic search, RAG, and enhancing language models' understanding
cosine similarity is preferred over dot product for normalized embeddings
Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews