Attention (machine learning)

Flash attention speeds up processing by tiling attention across input, avoiding N×N matrix materialization

Attention (machine learning)

Flash attention speeds up processing by tiling attention across input, avoiding N×N matrix materialization

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews