
Operator fusion merges adjacent operations to optimize execution and reduce memory traffic
Image: Emperor Genius (talk), CC BY-SA 3.0, via Wikimedia Commons
Operator fusion merges adjacent operations to optimize execution and reduce memory traffic
fused kernels do
Fused kernels combine multiple operations into one kernel to avoid memory round-trips
kernel fusion reduces memory bandwidth bottleneck
Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers
Overlapping subproblems
Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations
gradient checkpointing trades: recomputes activations to save memory
Gradient checkpointing trades off computation time for memory savings by recomputing activations
load balancing loss is needed in MoE
Load balancing loss in MoE prevents expert collapse by distributing workload evenly across experts
warp divergence kills performance
Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews