GQA reduces KV-cache memory by the group factor

GQA reduces KV-cache memory by dividing storage by the number of groups

Image: Swilsonmc, CC BY-SA 3.0, via Wikimedia Commons

GQA reduces KV-cache memory by the group factor

GQA reduces KV-cache memory by dividing storage by the number of groups

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews