MoE models have more parameters but similar compute cost

MoE models distribute parameters across k experts, reducing active experts' compute cost

Image: Unknown authorUnknown author, Public domain, via Wikimedia Commons

MoE models have more parameters but similar compute cost

MoE models distribute parameters across k experts, reducing active experts' compute cost

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews