Arithmetic intensity = FLOPs / Bytes accessed
Image: Cancella, CC BY-SA 4.0, via Wikimedia Commons
Arithmetic intensity = FLOPs / Bytes accessed
Delay-line memory
CPU speed grows faster than memory speed
Flashbulb memory
Flashbulb memories are vivid but not always accurate
instruction-level parallelism (ILP) achieves: multiple operations per clock cycle
Instruction-level parallelism (ILP) achieves: Multiple operations per clock cycle
Triton auto-tunes BLOCK_SIZE: different sizes optimize for different hardware
Triton auto-tunes BLOCK_SIZE for hardware efficiency, optimizing memory access patterns and computational throughput
Dynamic random-access memory
DRAM requires periodic refreshing to maintain data integrity
quantization to INT8 doubles throughput
Quantization to INT8 doubles throughput because tensor cores process INT8 2x faster
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews