Newton's method has quadratic convergence but requires cubic computational cost per iteration

Why second-order methods (Newton's) converge faster but are expensive: O(n³) per step

Newton's method has quadratic convergence but requires cubic computational cost per iteration

Related concepts

Time complexity of binary search: O(log n) — halves search space each step

Binary search reduces search space by half with each iteration, achieving O(log n) complexity

Why non-convex loss landscapes are hard: many local minima and saddle points

Non-convex landscapes have numerous local minima and saddle points, complicating optimization

Why attention is O(n²) in sequence length: every token attends to every other token

Attention mechanism's complexity arises from pairwise token interactions, leading to quadratic time complexity

Why proximal gradient descent is needed for L1 optimization

Proximal gradient descent handles non-differentiable L1 regularization, enabling sparse solutions

How KV-cache reduces redundant computation in autoregressive generation

KV-cache minimizes redundant computations by storing intermediate results in autoregressive models

Time complexity of Dijkstra's algorithm: O((V+E) log V) with a priority queue

Dijkstra's algorithm: O((V+E) log V) using a Fibonacci heap

Swipe through 100 ML concepts daily