LoRA rank r controls model capacity and parameters
Image: Saloni Dattani, CC BY 4.0, via Wikimedia Commons
LoRA rank r controls model capacity and parameters
In LoRA, the rank r determines the model's capacity and the number of parameters. A higher rank results in a model with greater capacity and more parameters.
Example
A LoRA model with rank r=32 will have more capacity and parameters compared to a model with rank r=16.
Understanding the relationship between rank r and model capacity is crucial for optimizing performance in LoRA models.
LoRA (machine learning)
LoRA uses r << d for efficient adaptation
Alex Lora Cercos
Alex Lora is a Spanish film director
LoRA vs full fine-tuning: LoRA trains rank-r adapters (~0.1% params), full FT updates everything
LoRA trains rank-r adapters (~0.1% params), full FT updates everything
MoE models have more parameters but similar compute cost
MoE models distribute parameters across k experts, reducing active experts' compute cost
Neural scaling law
Chinchilla scaling law: optimal model size scales linearly with compute budget
the vocabulary size matters: larger vocab = shorter sequences but more parameters
Larger vocab reduces sequence length, increasing model complexity and parameters
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews