`output = exp(row - max_val) / sum(exp(row - max_val))`
Image: RepRapPro, CC BY 3.0, via Wikimedia Commons
`output = exp(row - max_val) / sum(exp(row - max_val))`
to write a vector addition kernel in Triton: load blocks, add, store
```
Softmax function
Softmax converts real numbers into a probability distribution
a Triton kernel is
Triton kernel: Python-based GPU programming that compiles to PTX
fused kernels do
Fused kernels combine multiple operations into one kernel to avoid memory round-trips
tl.dot does in Triton: block-level matrix multiply using tensor cores
tl.dot performs block-level matrix multiplication using tensor cores in Triton
kernel fusion reduces memory bandwidth bottleneck
Kernel fusion reduces memory bandwidth bottleneck by combining multiple operations into a single kernel, minimizing data transfers
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews