a CDN does: caches content at edge locations close to users

A CDN caches content at edge locations close to users

Related concepts

paged attention (vLLM) improves serving throughput

Paged attention (vLLM) improves serving throughput by reducing latency through non-contiguous KV-cache pages, enabling faster data retrieval

consistent hashing does: minimizes remapping when nodes join/leave

Consistent hashing distributes data across nodes, minimizing remapping when nodes join/leave

BFS vs DFS: BFS finds shortest path in unweighted graphs, DFS uses less memory

BFS finds shortest path in unweighted graphs; DFS uses less memory

CPU cache

L1/L2 cache hierarchy reduces global memory latency

Batch norm vs layer norm: BN across batch, LN across features

Batch norm (BN) normalizes across batch, layer norm (LN) normalizes across features; LN handles variable-length sequences

Pre-LN

Pre-LN: LayerNorm before attention; Post-LN: LayerNorm after attention

Swipe through 100 ML concepts daily