multi-query attention (MQA) is

Multi-query attention (MQA) with shared KV head: Q heads share a single KV head for efficient parameter usage

Image: Metalicat, CC0, via Wikimedia Commons

multi-query attention (MQA) is

Multi-query attention (MQA) with shared KV head: Q heads share a single KV head for efficient parameter usage

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews