
Tiling in matrix multiplication optimizes cache usage by partitioning matrices into submatrices
Tiling in matrix multiplication optimizes cache usage by partitioning matrices into submatrices
Why most transformer operations are memory-bound, not compute-bound
Transformer operations rely heavily on matrix multiplications, which are memory-intensive tasks
Why memory coalescing matters — adjacent threads reading adjacent memory addresses
Memory coalescing reduces cache misses, improving multithreaded application performance
How do lock-free data structures manage concurrent access to shared memory in a multithreaded environment?
Lock-free data structures use atomic operations to ensure concurrent access without traditional locking mechanisms
How KV-cache reduces redundant computation in autoregressive generation
KV-cache minimizes redundant computations by storing intermediate results in autoregressive models
What LSM trees optimize: write-heavy workloads by buffering writes in memory
LSM trees optimize write-heavy workloads through in-memory buffering
What operator fusion does at the compiler level: merges adjacent ops to reduce memory traffic
Operator fusion optimizes code by combining adjacent operations into a single instruction, minimizing memory access
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews