Adam optimizer weight update with m and v terms

Adam optimizer weight update: w_t = w_{t-1} - α * m_t / (sqrt(v_t) + ε)

Related concepts

Gradient descent

Gradient descent weight update equation: w := w - α * ∇J(w)

Write the Bellman equation for reinforcement learning

Bellman equation: V(s) = max_a [R(s,a) + γ Σ P(s'|s,a) V(s')]

Regression analysis

Linear regression equation: ŷ = β0 + β1X

Adam vs SGD: Adam adapts per-parameter rates, SGD often generalizes better with tuning

Adam adjusts learning rates per-parameter, SGD generalizes better with tuning

ReLU and Leaky ReLU

ReLU: f(x) = max(0, x); Leaky ReLU: f(x) = x if x > 0 else αx (α < 1)

Adam has bias correction: divides by (1-β^t) in early steps

Adam bias correction divides by (1-β^t) in early steps to counteract initial bias from accumulated gradients

Swipe through 100 ML concepts daily