A CDN caches content at edge locations close to users
Image: Földhegy, CC BY-SA 3.0, via Wikimedia Commons
A CDN caches content at edge locations close to users
paged attention (vLLM) improves serving throughput
Paged attention (vLLM) improves serving throughput by reducing latency through non-contiguous KV-cache pages, enabling faster data retrieval
consistent hashing does: minimizes remapping when nodes join/leave
Consistent hashing distributes data across nodes, minimizing remapping when nodes join/leave
BFS vs DFS: BFS finds shortest path in unweighted graphs, DFS uses less memory
BFS finds shortest path in unweighted graphs; DFS uses less memory
CPU cache
L1/L2 cache hierarchy reduces global memory latency
Batch norm vs layer norm: BN across batch, LN across features
Batch norm (BN) normalizes across batch, layer norm (LN) normalizes across features; LN handles variable-length sequences
Pre-LN
Pre-LN: LayerNorm before attention; Post-LN: LayerNorm after attention
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews