
Adam optimizer weight update: w_t = w_{t-1} - α * m_t / (sqrt(v_t) + ε)
Image: NOAA, Public domain, via Wikimedia Commons
Adam optimizer weight update: w_t = w_{t-1} - α * m_t / (sqrt(v_t) + ε)
Gradient descent
Gradient descent weight update equation: w := w - α * ∇J(w)
Write the Bellman equation for reinforcement learning
Bellman equation: V(s) = max_a [R(s,a) + γ Σ P(s'|s,a) V(s')]
Regression analysis
Linear regression equation: ŷ = β0 + β1X
Adam vs SGD: Adam adapts per-parameter rates, SGD often generalizes better with tuning
Adam adjusts learning rates per-parameter, SGD generalizes better with tuning
ReLU and Leaky ReLU
ReLU: f(x) = max(0, x); Leaky ReLU: f(x) = x if x > 0 else αx (α < 1)
Adam has bias correction: divides by (1-β^t) in early steps
Adam bias correction divides by (1-β^t) in early steps to counteract initial bias from accumulated gradients
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews