mean pooling often outperforms [CLS] for sentence similarity tasks

Mean pooling captures overall sentence meaning better than [CLS] token embedding

Image: These maps and charts are scanned from "Atlas of the Middle East", published in January 1993 by the U.S. Central Intelli, Public domain, via Wikimedia Commons

mean pooling often outperforms [CLS] for sentence similarity tasks

Mean pooling captures overall sentence meaning better than [CLS] token embedding

Ask Claude to explain

Related concepts

[CLS] pooling does: uses the first token's embedding as the sentence representation

CLS pooling: uses the first token's embedding as the sentence representation

mean pooling does: averages all token embeddings to get a sentence embedding

Mean pooling: averages all token embeddings to create a sentence embedding

weight tying does in language models: shares embedding and output projection matrices

Tying reduces the number of parameters by sharing embedding and output projection matrices

384-dim all-MiniLM-L6-v2 optimizes: fast sentence similarity with 6 layers

All-MiniLM-L6-v2 optimizes fast sentence similarity with 6 layers

1536-dim OpenAI text-embedding-3-large is used for: semantic search and RAG

Used for semantic search, RAG, and enhancing language models' understanding

cosine similarity is preferred over dot product for normalized embeddings

Cosine similarity measures orientation, not magnitude, making it ideal for normalized embeddings

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews