AWQ quantizes weights while preserving critical activation values for neural network efficiency

What AWQ does differently — activation-aware weight quantization preserves important weights

AWQ quantizes weights while preserving critical activation values for neural network efficiency

Related concepts

How does batch normalization contribute to training deep neural networks: by normalizing input features within each batch to have zero mean and unit variance to accelerate convergence and improve generalization?

Batch normalization stabilizes and accelerates deep learning training by normalizing input features

What LoRA does — adds trainable low-rank matrices A and B where ΔW = BA

LoRA: Augments model weights with low-rank matrices A, B, ΔW = BA

Why ALiBi allows length extrapolation better than learned position embeddings

ALiBi uses fixed-length position encodings, enabling efficient length extrapolation without model retraining

What score matching does: learns the gradient of the log-density without normalizing

Score matching approximates log-density gradients for variational inference without normalization

What weight tying does in language models: shares embedding and output projection matrices

Language models use tied weights to share embedding and output projection matrices, enhancing parameter efficiency

What multi-query attention (MQA) is — all Q heads share a single KV head

MQA: Multi-query attention with shared key-value head for efficient cross-query processing

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews