Adam vs SGD: Adam adapts per-parameter rates, SGD often generalizes better with tuning

Adam adjusts learning rates per-parameter, SGD generalizes better with tuning

Image: Official GDC, CC BY 2.0, via Wikimedia Commons

Adam vs SGD: Adam adapts per-parameter rates, SGD often generalizes better with tuning

Adam adjusts learning rates per-parameter, SGD generalizes better with tuning

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews