grouped query attention (GQA) does

GQA shares KV heads across multiple Q heads for efficient parameter usage

Image: Jouasse, CC BY-SA 4.0, via Wikimedia Commons

grouped query attention (GQA) does

GQA shares KV heads across multiple Q heads for efficient parameter usage

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews