Gradient points uphill in the direction of steepest increase of f

Gradient

Gradient points uphill in the direction of steepest increase of f

The gradient of a function indicates the direction of steepest ascent, guiding us toward the point of maximum increase.

The gradient vector field transforms like a vector under changes in the coordinate system, maintaining its fundamental properties.

Stationary points, where the gradient is zero, are crucial in optimization as they indicate potential maxima, minima, or saddle points.

Example

For f(x, y) = x^2 + y^2, the gradient ∇f = (2x, 2y) points uphill in the direction of steepest increase.

Understanding the gradient direction helps in finding optimal solutions in various applications like machine learning and optimization.

Related concepts

the momentum term does: v_t = βv_{t-1} + ∇L, accumulates gradient direction

Momentum term accelerates convergence in the gradient direction

Adam has bias correction: divides by (1-β^t) in early steps

Adam bias correction divides by (1-β^t) in early steps to counteract initial bias from accumulated gradients

saddle points are more common than local minima in high dimensions

Saddle points arise due to mixed partial derivatives being zero, leading to more complex curvature in high dimensions

AdaGrad's learning rate decays to zero

AdaGrad adjusts learning rate by accumulating squared gradients, causing it to decay to zero as denominator grows exponentially

Geodesics on an ellipsoid

Geodesics are the shortest paths on a curved surface

learning rate warmup does: starts small to avoid early training instability

Learning rate warmup gradually increases the learning rate from zero to a predefined value to stabilize training initially

Swipe through 100 ML concepts daily