Over-smoothing in GNNs: Deeper layers cause node features to converge too much, losing unique node identities

Image: Mike Tigas from Columbia, MO, United States, CC BY 2.0, via Wikimedia Commons

the over-smoothing problem is in GNNs: deep GNNs make all node features converge

Over-smoothing in GNNs: Deeper layers cause node features to converge too much, losing unique node identities

Related concepts

message passing does in GNNs: each node aggregates features from its neighbors

Each node aggregates features from its neighbors using message passing

GCN (Graph Convolutional Network) does: spectral convolution approximated by neighbor averaging

GCN approximates spectral convolution via neighbor averaging

Pre-LN transformers are easier to train

Pre-LN transformers use residual connections, allowing gradients to flow more smoothly during backpropagation

batch size affects generalization: larger batches find sharper minima

Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates

Vanishing gradient problem

Residual connections help by allowing gradient flow through the skip connection

Adam has bias correction: divides by (1-β^t) in early steps

Adam bias correction divides by (1-β^t) in early steps to counteract initial bias from accumulated gradients

Swipe through 100 ML concepts daily