GPTQ quantization does

Post-training quantization using second-order information for model compression

Related concepts

AWQ does differently

AWQ selectively retains weights crucial for model performance, unlike traditional quantization

GPTQ vs AWQ: GPTQ uses Hessian-based quantization, AWQ preserves activation-important weights

GPTQ applies Hessian-based quantization, AWQ retains weights crucial for activations

quantization to INT8 doubles throughput

Quantization to INT8 doubles throughput because tensor cores process INT8 2x faster

Vector quantization

Product quantization compresses vectors by splitting them into subvectors and quantizing each subvector independently

Shannon's source coding theorem: you can't compress below entropy

Shannon's theorem: Data compression can't exceed entropy limit

ALiBi allows length extrapolation better than learned position embeddings

ALiBi uses relative positional encoding, avoiding fixed-size embeddings, enabling better handling of variable-length sequences

Swipe through 100 ML concepts daily