Reparameterization trick

Reparameterization trick enables differentiable sampling for VAE training

The reparameterization trick allows for gradients to be computed through random variables, which is crucial for optimizing models with stochastic elements. It was developed in the 1980s and later applied to variational autoencoders in 2013.

Example

In a VAE, the trick enables the computation of gradients for the latent variable sampling process, allowing for efficient training of the model.

This technique is essential for training VAEs as it enables the use of stochastic gradient descent and reduces the variance of estimators.

Related concepts

Write the reparameterization trick z = μ + σ⊙ε

Reparameterization trick: z = μ + σ⊙ε

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

Proximal gradient methods for learning

Proximal gradient descent efficiently handles non-differentiable L1 regularization by combining gradient descent with a proximity operator

weight initialization matters: Xavier/He init keeps activation variance ≈ 1 across layers

Weight initialization stabilizes learning by maintaining consistent activation variance

LAMB optimizer does: layer-wise adaptive learning rates for large batch training

LAMB optimizer adjusts learning rates layer-wise for large batch training

MoE models have more parameters but similar compute cost

MoE models distribute parameters across k experts, reducing active experts' compute cost

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews