[CLS] marks the start of input, [SEP] denotes separation, [PAD] fills space, [MASK] hides words for prediction
Image: William Blake, No restrictions, via Wikimedia Commons
[CLS] marks the start of input, [SEP] denotes separation, [PAD] fills space, [MASK] hides words for prediction
subword tokenization solves: handles rare words by breaking into known pieces
Subword tokenization solves rare word handling by breaking into known pieces
Masking (behavior)
Causal masking prevents attention to future tokens in the decoder
Unigram tokenization does: starts with large vocabulary and prunes using EM
Unigram tokenization starts with a large vocabulary and prunes using EM
[CLS] pooling does: uses the first token's embedding as the sentence representation
CLS pooling: uses the first token's embedding as the sentence representation
WordPiece tokenization does: similar to BPE but uses likelihood instead of frequency
WordPiece tokenization splits words into subwords based on token likelihood rather than frequency
Large language model
LLMs can generate, summarize, translate, and analyze text in many contexts
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews