Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations
Image: Sora / OpenAI, Public domain, via Wikimedia Commons
Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations
Greedy vs dynamic programming: greedy makes locally optimal choices, DP considers all subproblems
Greedy: locally optimal choices; DP: considers all subproblems
fused kernels do
Fused kernels combine multiple operations into one kernel to avoid memory round-trips
gradient checkpointing trades: recomputes activations to save memory
Gradient checkpointing trades off computation time for memory savings by recomputing activations
gradient accumulation simulates larger batch sizes without more memory
Gradient accumulation reduces memory usage by dividing a large batch into smaller mini-batches, accumulating gradients before updating model weights
batch size affects generalization: larger batches find sharper minima
Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates
Kolmogorov complexity
Kolmogorov complexity is uncomputable
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews