Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers

Image: Chad Davis, CC BY 2.0, via Wikimedia Commons

kernel fusion reduces memory bandwidth bottleneck

Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers

Related concepts

fused kernels do

Fused kernels combine multiple operations into one kernel to avoid memory round-trips

operator fusion does at the compiler level: merges adjacent ops to reduce memory traffic

Operator fusion merges adjacent operations to optimize execution and reduce memory traffic

Triton auto-tunes BLOCK_SIZE: different sizes optimize for different hardware

Triton auto-tunes BLOCK_SIZE for hardware efficiency, optimizing memory access patterns and computational throughput

Von Neumann architecture

CPU must fetch both data and instructions from memory

load balancing loss is needed in MoE

Load balancing loss in MoE prevents expert collapse by distributing workload evenly across experts

gradient checkpointing trades: recomputes activations to save memory

Gradient checkpointing trades off computation time for memory savings by recomputing activations

Swipe through 100 ML concepts daily