All-MiniLM-L6-v2 optimizes fast sentence similarity with 6 layers
Image: Petar Milošević, CC BY-SA 4.0, via Wikimedia Commons
All-MiniLM-L6-v2 optimizes fast sentence similarity with 6 layers
mean pooling often outperforms [CLS] for sentence similarity tasks
Mean pooling captures overall sentence meaning better than [CLS] token embedding
batch size affects generalization: larger batches find sharper minima
Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates
the vocabulary size matters: larger vocab = shorter sequences but more parameters
Larger vocab reduces sequence length, increasing model complexity and parameters
1536-dim OpenAI text-embedding-3-large is used for: semantic search and RAG
Used for semantic search, RAG, and enhancing language models' understanding
768-dim BERT embeddings capture: bidirectional context from masked language modeling
768-dim BERT embeddings capture bidirectional context from masked language modeling
[CLS] pooling does: uses the first token's embedding as the sentence representation
CLS pooling: uses the first token's embedding as the sentence representation
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews