Occupancy = Active Warps / Max Warps
Image: Coolcaesar, CC BY 4.0, via Wikimedia Commons
Occupancy = Active Warps / Max Warps
CPU cache
L1/L2 cache hierarchy reduces global memory latency
warp divergence kills performance
Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput
CUDA
CUDA enables parallel computation on GPUs
tensor cores are
Tensor cores are specialized hardware for matrix multiply-accumulate on GPU
Dynamic random-access memory
DRAM requires periodic refreshing to maintain data integrity
cooperative groups enable in CUDA: flexible thread synchronization patterns
Cooperative groups enable flexible thread synchronization patterns in CUDA
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews