
Fused kernels combine multiple operations into one kernel to avoid memory round-trips
Image: Pratik89Roy, CC BY-SA 4.0, via Wikimedia Commons
Fused kernels combine multiple operations into one kernel to avoid memory round-trips
kernel fusion reduces memory bandwidth bottleneck
Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers
operator fusion does at the compiler level: merges adjacent ops to reduce memory traffic
Operator fusion merges adjacent operations to optimize execution and reduce memory traffic
Overlapping subproblems
Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations
Triton auto-tunes BLOCK_SIZE: different sizes optimize for different hardware
Triton auto-tunes BLOCK_SIZE for hardware efficiency, optimizing memory access patterns and computational throughput
gradient checkpointing trades: recomputes activations to save memory
Gradient checkpointing trades off computation time for memory savings by recomputing activations
Flashbulb memory
Flashbulb memories are vivid but not always accurate
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews