Langevin dynamics adds noise to gradient descent to sample from a distribution
Image: Rembrandt, Public domain, via Wikimedia Commons
Langevin dynamics adds noise to gradient descent to sample from a distribution
Diffusion model
q(x_t|x_{t-1}) adds Gaussian noise at each step
gradient accumulation simulates larger batch sizes without more memory
Gradient accumulation reduces memory usage by dividing a large batch into smaller mini-batches, accumulating gradients before updating model weights
AdaGrad's learning rate decays to zero
AdaGrad adjusts learning rate by accumulating squared gradients, causing it to decay to zero as denominator grows exponentially
Lyapunov exponents measure: rate of divergence of nearby trajectories in a dynamical system
Lyapunov exponents measure the rate of divergence of nearby trajectories in a dynamical system
to standardize: when you need zero mean and unit variance for gradient-based optimization
Standardize when zero mean and unit variance are required for gradient-based optimization
Stable Diffusion
Stable Diffusion generates images from text descriptions
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews