nvcc compiles CUDA code to PTX and SASS
Image: King of Hearts, CC BY-SA 3.0, via Wikimedia Commons
nvcc compiles CUDA code to PTX and SASS
SASS is: the actual machine code that runs on NVIDIA GPU hardware
SASS: compiled machine code executing on NVIDIA GPU hardware
Parallel Thread Execution
PTX is an intermediate GPU instruction set used in Nvidia's CUDA
a Triton kernel is
Triton kernel: Python-based GPU programming that compiles to PTX
CUDA
CUDA enables parallel computation on GPUs
Triton differs from CUDA
Triton uses block-level programming, while CUDA uses thread-level programming
TensorRT does: NVIDIA's inference optimizer that quantizes and fuses operations
TensorRT optimizes deep learning inference by quantizing and fusing operations for NVIDIA GPUs
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews