occupancy means in GPU programming

Occupancy = Active Warps / Max Warps

Related concepts

CPU cache

L1/L2 cache hierarchy reduces global memory latency

warp divergence kills performance

Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput

CUDA

CUDA enables parallel computation on GPUs

tensor cores are

Tensor cores are specialized hardware for matrix multiply-accumulate on GPU

Dynamic random-access memory

DRAM requires periodic refreshing to maintain data integrity

cooperative groups enable in CUDA: flexible thread synchronization patterns

Cooperative groups enable flexible thread synchronization patterns in CUDA

Swipe through 100 ML concepts daily