Context window limit restricts the model's input size to a fixed number of tokens for processing

What the context window limit means: maximum number of tokens the model can process at once

Context window limit restricts the model's input size to a fixed number of tokens for processing

Related concepts

What the compute-optimal training ratio is: roughly 20 tokens per parameter

Optimal training ratio: Approximately 20 tokens/parameter

What BPE tokenization does: iteratively merges the most frequent byte pairs

BPE tokenization merges the most frequent byte pairs iteratively to create subword units

Why attention is O(n²) in sequence length: every token attends to every other token

Attention mechanism's complexity arises from pairwise token interactions, leading to quadratic time complexity

What continuous batching does — adds new requests to a running batch without waiting

Continuous batching enables immediate request addition, enhancing throughput and efficiency

What 300-dim word2vec encodes: trained on word co-occurrence with skip-gram window

300-dim Word2Vec trained on word co-occurrence with skip-gram window

What weight tying does in language models: shares embedding and output projection matrices

Language models use tied weights to share embedding and output projection matrices, enhancing parameter efficiency

Swipe through 100 ML concepts daily