How to write a fused softmax kernel in Triton: load row, compute max, subtract, exp, sum, divide

`fused_softmax_kernel(input, output): row_max = max_pool2d(input, row_length); exp_diff = exp(input - row_max); softmax_sum = sum(exp_diff, axis=1); output = exp_diff / softmax_sum`

How to write a fused softmax kernel in Triton: load row, compute max, subtract, exp, sum, divide

`fused_softmax_kernel(input, output): row_max = max_pool2d(input, row_length); exp_diff = exp(input - row_max); softmax_sum = sum(exp_diff, axis=1); output = exp_diff / softmax_sum`

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews