
Lottery ticket hypothesis posits sparse subnetworks can match full network performance
Image: Watchers Club, CC BY 3.0, via Wikimedia Commons
Lottery ticket hypothesis posits sparse subnetworks can match full network performance
dropout works as regularization: it approximates an ensemble of subnetworks
Dropout randomly deactivates neurons during training, simulating an ensemble of subnetworks, thus preventing co-adaptation and improving generalization
the curse of dimensionality makes nearest neighbor search unreliable
High dimensionality dilutes data density, making nearest neighbors less distinct and search unreliable
GraphSAGE does: samples and aggregates a fixed-size neighborhood
GraphSAGE samples and aggregates a fixed-size neighborhood
Chebyshev's inequality
Chebyshev's inequality limits the probability of deviation from the mean
message passing does in GNNs: each node aggregates features from its neighbors
Each node aggregates features from its neighbors using message passing
ring attention does: distributes long sequences across multiple devices
Ring attention distributes long sequences across multiple devices
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews