Write the multi-head attention formula: MultiHead(Q,K,V) = Concat(head_1,...,head_h)W^O

MultiHead(Q,K,V) = Concat(head_i=MultiHeadAttention(Q,K,V)_i)W^O

Image: Mushki Brichta, CC BY-SA 4.0, via Wikimedia Commons

Write the multi-head attention formula: MultiHead(Q,K,V) = Concat(head_1,...,head_h)W^O

MultiHead(Q,K,V) = Concat(head_i=MultiHeadAttention(Q,K,V)_i)W^O

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews