the compute-optimal training ratio is: roughly 20 tokens per parameter

Compute-optimal training ratio: roughly 20 tokens per parameter

Image: BlendoGames, CC BY 2.0, via Wikimedia Commons

the compute-optimal training ratio is: roughly 20 tokens per parameter

Compute-optimal training ratio: roughly 20 tokens per parameter

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews