Residual connections help by allowing gradient flow through the skip connection
Image: Moumouza at English Wikipedia, Public domain, via Wikimedia Commons
Residual connections help by allowing gradient flow through the skip connection
Residual connections, also known as skip connections, enable gradients to bypass certain layers in a neural network. This helps mitigate the vanishing gradient problem by providing an alternative path for the gradient flow, ensuring that earlier layers receive sufficient gradient updates.
Example
In a deep neural network, a residual block might consist of a series of convolutional layers followed by a skip connection that adds the input of the block to its output. This allows the gradients to directly flow through the skip connection, bypassing the intermediate layers.
Residual connections are crucial for training deep neural networks effectively, as they help maintain stable gradient magnitudes throughout the network.
Pre-LN transformers are easier to train
Pre-LN transformers use residual connections, allowing gradients to flow more smoothly during backpropagation
gradient checkpointing trades: recomputes activations to save memory
Gradient checkpointing trades off computation time for memory savings by recomputing activations
gradient accumulation simulates larger batch sizes without more memory
Gradient accumulation reduces memory usage by dividing a large batch into smaller mini-batches, accumulating gradients before updating model weights
dropout works as regularization: it approximates an ensemble of subnetworks
Dropout randomly deactivates neurons during training, simulating an ensemble of subnetworks, thus preventing co-adaptation and improving generalization
SGD with momentum escapes local minima better than vanilla SGD
SGD with momentum adds velocity to escape shallow local minima faster
The elastic net combines L1 and L2: λ₁|w| + λ₂w² gives both sparsity and stability
Elastic net: λ₁|w| + λ₂w² enforces sparsity and stability simultaneously
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews