Over-smoothing in GNNs: Deeper layers cause node features to converge too much, losing unique node identities
Image: Mike Tigas from Columbia, MO, United States, CC BY 2.0, via Wikimedia Commons
Over-smoothing in GNNs: Deeper layers cause node features to converge too much, losing unique node identities
message passing does in GNNs: each node aggregates features from its neighbors
Each node aggregates features from its neighbors using message passing
GCN (Graph Convolutional Network) does: spectral convolution approximated by neighbor averaging
GCN approximates spectral convolution via neighbor averaging
Pre-LN transformers are easier to train
Pre-LN transformers use residual connections, allowing gradients to flow more smoothly during backpropagation
batch size affects generalization: larger batches find sharper minima
Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates
Vanishing gradient problem
Residual connections help by allowing gradient flow through the skip connection
Adam has bias correction: divides by (1-β^t) in early steps
Adam bias correction divides by (1-β^t) in early steps to counteract initial bias from accumulated gradients
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews