QLoRA adds

QLoRA quantizes base model to 4-bit, trains LoRA adapters in FP16

Related concepts

LoRA vs full fine-tuning: LoRA trains rank-r adapters (~0.1% params), full FT updates everything

LoRA trains rank-r adapters (~0.1% params), full FT updates everything

LoRA (machine learning)

LoRA uses r << d for efficient adaptation

2024 in hip-hop

LoRA rank r controls model capacity and parameters

Alex Lora Cercos

Alex Lora is a Spanish film director

XLA does for TensorFlow/JAX: compiles computation graphs for TPU/GPU execution

XLA compiles computation graphs for TPU/GPU execution

AWQ does differently

AWQ selectively retains weights crucial for model performance, unlike traditional quantization

Swipe through 100 ML concepts daily