Gradient points uphill in the direction of steepest increase of f
Gradient points uphill in the direction of steepest increase of f
The gradient of a function indicates the direction of steepest ascent, guiding us toward the point of maximum increase.
The gradient vector field transforms like a vector under changes in the coordinate system, maintaining its fundamental properties.
Stationary points, where the gradient is zero, are crucial in optimization as they indicate potential maxima, minima, or saddle points.
Example
For f(x, y) = x^2 + y^2, the gradient ∇f = (2x, 2y) points uphill in the direction of steepest increase.
Understanding the gradient direction helps in finding optimal solutions in various applications like machine learning and optimization.
the momentum term does: v_t = βv_{t-1} + ∇L, accumulates gradient direction
Momentum term accelerates convergence in the gradient direction
Adam has bias correction: divides by (1-β^t) in early steps
Adam bias correction divides by (1-β^t) in early steps to counteract initial bias from accumulated gradients
saddle points are more common than local minima in high dimensions
Saddle points arise due to mixed partial derivatives being zero, leading to more complex curvature in high dimensions
AdaGrad's learning rate decays to zero
AdaGrad adjusts learning rate by accumulating squared gradients, causing it to decay to zero as denominator grows exponentially
Geodesics on an ellipsoid
Geodesics are the shortest paths on a curved surface
learning rate warmup does: starts small to avoid early training instability
Learning rate warmup gradually increases the learning rate from zero to a predefined value to stabilize training initially
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews