
Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput
Image: Robert M. Lavinsky, CC BY-SA 3.0, via Wikimedia Commons
Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput
occupancy means in GPU programming
Occupancy = Active Warps / Max Warps
load balancing loss is needed in MoE
Load balancing loss in MoE prevents expert collapse by distributing workload evenly across experts
Flashbulb memory
Flashbulb memories are vivid but not always accurate
kernel fusion reduces memory bandwidth bottleneck
Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers
gradient checkpointing trades: recomputes activations to save memory
Gradient checkpointing trades off computation time for memory savings by recomputing activations
Overdrawn at the Memory Bank
Overdrawn at the Memory Bank was shot on videotape due to budget constraints
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews