Loop unrolling reduces loop overhead by executing multiple iterations simultaneously, increasing code size
Image: Christian David, CC BY-SA 4.0, via Wikimedia Commons
Loop unrolling reduces loop overhead by executing multiple iterations simultaneously, increasing code size
Overlapping subproblems
Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations
instruction tuning does: fine-tunes on (instruction, response) pairs
Fine-tunes on (instruction, response) pairs
operator fusion does at the compiler level: merges adjacent ops to reduce memory traffic
Operator fusion merges adjacent operations to optimize execution and reduce memory traffic
gradient accumulation simulates larger batch sizes without more memory
Gradient accumulation reduces memory usage by dividing a large batch into smaller mini-batches, accumulating gradients before updating model weights
warp divergence kills performance
Warp divergence causes threads to execute non-uniformly, leading to idle cycles and reduced throughput
RoPE's advantage is: supports length extrapolation beyond training context length
RoPE (Relative Position Encoding) advantage: supports length extrapolation beyond training context length
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews