L1/L2 cache hierarchy reduces global memory latency
Image: fir0002 flagstaffotos [at] gmail.com Canon 20D + Canon 70-200mm f/2.8 L, GFDL 1.2, via Wikimedia Commons
L1/L2 cache hierarchy reduces global memory latency
The L1/L2 cache hierarchy is designed to store copies of frequently accessed data, reducing the need to access slower main memory. This results in faster data retrieval for the CPU, enhancing overall system performance.
The L1 cache is typically smaller and faster, while the L2 cache is larger but slightly slower. This hierarchical structure allows for efficient data storage and retrieval, balancing speed and capacity.
Modern CPUs use this cache hierarchy to minimize the time spent accessing global memory, leading to improved computational efficiency and reduced latency.
Reducing global memory latency is crucial for enhancing CPU performance and efficiency.
GQA reduces KV-cache memory by the group factor
GQA reduces KV-cache memory by dividing storage by the number of groups
Dynamic random-access memory
DRAM requires periodic refreshing to maintain data integrity
2024–present global memory supply shortage
Global DRAM shortage began in 2024
Memory hierarchy
Memory hierarchy levels: registers → L1 → L2 → L3 → RAM → SSD → HDD (each ~10× slower)
occupancy means in GPU programming
Occupancy = Active Warps / Max Warps
CUDA
CUDA enables parallel computation on GPUs
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews