Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput

Image: Robert M. Lavinsky, CC BY-SA 3.0, via Wikimedia Commons

warp divergence kills performance

Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput

Related concepts

occupancy means in GPU programming

Occupancy = Active Warps / Max Warps

load balancing loss is needed in MoE

Load balancing loss in MoE prevents expert collapse by distributing workload evenly across experts

Flashbulb memory

Flashbulb memories are vivid but not always accurate

kernel fusion reduces memory bandwidth bottleneck

Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers

gradient checkpointing trades: recomputes activations to save memory

Gradient checkpointing trades off computation time for memory savings by recomputing activations

Overdrawn at the Memory Bank

Overdrawn at the Memory Bank was shot on videotape due to budget constraints

Swipe through 100 ML concepts daily