torch.compile optimizes computation graph by tracing and compiling it for efficiency

Image: Ingmar Runge, CC BY-SA 3.0, via Wikimedia Commons

torch.compile does in PyTorch 2.0: traces and optimizes the computation graph

torch.compile optimizes computation graph by tracing and compiling it for efficiency

Related concepts

Tracing

Tracing records operations, scripting parses Python

a Triton @triton.jit decorator does: compiles a Python function into a GPU kernel

@triton.jit decorator compiles Python function into a GPU kernel

XLA does for TensorFlow/JAX: compiles computation graphs for TPU/GPU execution

XLA compiles computation graphs for TPU/GPU execution

Arm architecture family

ARM processors are the most widely used family of instruction set architectures

tl.load and tl.store do in Triton: read/write tensors from/to GPU global memory

`tl.load` reads tensors from GPU memory; `tl.store` writes tensors to GPU memory

Greedy vs dynamic programming: greedy makes locally optimal choices, DP considers all subproblems

Greedy: locally optimal choices; DP: considers all subproblems

Swipe through 100 ML concepts daily