Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations

Image: Sora / OpenAI, Public domain, via Wikimedia Commons

Overlapping subproblems

Dynamic programming solves overlapping subproblems by storing results of subproblems to avoid redundant calculations

Related concepts

Greedy vs dynamic programming: greedy makes locally optimal choices, DP considers all subproblems

Greedy: locally optimal choices; DP: considers all subproblems

fused kernels do

Fused kernels combine multiple operations into one kernel to avoid memory round-trips

gradient checkpointing trades: recomputes activations to save memory

Gradient checkpointing trades off computation time for memory savings by recomputing activations

gradient accumulation simulates larger batch sizes without more memory

Gradient accumulation reduces memory usage by dividing a large batch into smaller mini-batches, accumulating gradients before updating model weights

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

Kolmogorov complexity

Kolmogorov complexity is uncomputable

Swipe through 100 ML concepts daily