self-attention: Attention(Q,K,V) = softmax(QK^T/√d_k)V

Attention(Q,K,V) = softmax(QKᵀ/√d_k)V

Image: GruenerBogen, CC BY-SA 4.0, via Wikimedia Commons

self-attention: Attention(Q,K,V) = softmax(QK^T/√d_k)V

Attention(Q,K,V) = softmax(QKᵀ/√d_k)V

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews