Memory coalescing reduces cache misses, improving multithreaded application performance
Memory coalescing reduces cache misses, improving multithreaded application performance
What bank conflicts are in shared memory — multiple threads accessing the same bank
Shared memory conflicts arise when multiple threads concurrently access the same bank in a banking system
How tiling works in matrix multiplication — loading blocks into shared memory
Tiling in matrix multiplication optimizes cache usage by partitioning matrices into submatrices
How do lock-free data structures manage concurrent access to shared memory in a multithreaded environment?
Lock-free data structures use atomic operations to ensure concurrent access without traditional locking mechanisms
What LSM trees optimize: write-heavy workloads by buffering writes in memory
LSM trees optimize write-heavy workloads through in-memory buffering
How KV-cache reduces redundant computation in autoregressive generation
KV-cache minimizes redundant computations by storing intermediate results in autoregressive models
What a thread block is in CUDA — a group of threads that share shared memory
A CUDA thread block is a group of threads executing in parallel, sharing global and shared memory
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews