What causal masking does — prevents attention to future tokens in the decoder

Causal masking in transformer models prevents attention to future tokens in the decoder, preserving autoregressive property

What causal masking does — prevents attention to future tokens in the decoder

Causal masking in transformer models prevents attention to future tokens in the decoder, preserving autoregressive property

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews