What multi-query attention (MQA) is — all Q heads share a single KV head

MQA: Multi-query attention with shared key-value head for efficient cross-query processing

What multi-query attention (MQA) is — all Q heads share a single KV head

MQA: Multi-query attention with shared key-value head for efficient cross-query processing

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews