
Ill-conditioned matrices amplify input perturbations, leading to significant output variability
Image: Unknown authorUnknown author, CC BY-SA 3.0 de, via Wikimedia Commons
Ill-conditioned matrices amplify input perturbations, leading to significant output variability
The elastic net combines L1 and L2: λ₁|w| + λ₂w² gives both sparsity and stability
Elastic net: λ₁|w| + λ₂w² enforces sparsity and stability simultaneously
weight initialization matters: Xavier/He init keeps activation variance ≈ 1 across layers
Weight initialization stabilizes learning by maintaining consistent activation variance
AdaGrad's learning rate decays to zero
AdaGrad adjusts learning rate by accumulating squared gradients, causing it to decay to zero as denominator grows exponentially
to standardize: when you need zero mean and unit variance for gradient-based optimization
Standardize when zero mean and unit variance are required for gradient-based optimization
Eigenvalues and eigenvectors
Eigenvectors are unchanged in direction by a linear transformation
log-loss / cross-entropy loss penalizes: confident wrong predictions more heavily
Log-loss penalizes confident incorrect predictions more heavily
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews