Gradient descent weight update equation: w := w - α * ∇J(w)
Gradient descent weight update equation: w := w - α * ∇J(w)
Gradient descent is an optimization algorithm used to minimize functions by iteratively moving towards the steepest descent direction.
The weight update equation w := w - α * ∇J(w) represents the core of gradient descent, where w is the current weight vector, α (alpha) is the learning rate, and ∇J(w) is the gradient of the cost function J with respect to the weights w.
This equation ensures that the weights are adjusted in the direction that reduces the cost function, leading to convergence towards the minimum value of J.
Example
Suppose we have a cost function J(w) = (w - 3)², the gradient ∇J(w) = 2(w - 3). If our current weight w = 5 and learning rate α = 0.1, the weight update would be w := 5 - 0.1 * 2(5 - 3) = 5 - 0.1 * 4 = 4.6.
Understanding the weight update equation is crucial for implementing gradient descent in machine learning models, enabling them to learn and minimize error effectively.
Adam optimizer weight update with m and v terms
Adam optimizer weight update: w_t = w_{t-1} - α * m_t / (sqrt(v_t) + ε)
Regression analysis
Linear regression equation: ŷ = β0 + β1X
Stochastic gradient descent
Policy Gradient Theorem Equation
ReLU and Leaky ReLU
ReLU: f(x) = max(0, x); Leaky ReLU: f(x) = x if x > 0 else αx (α < 1)
Normalization (machine learning)
L2 normalization equation: x_i' = x_i / ||x||_2
Write the Bellman equation for reinforcement learning
Bellman equation: V(s) = max_a [R(s,a) + γ Σ P(s'|s,a) V(s')]
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews