Knowledge distillation transfers knowledge from a large model to a smaller one without loss of validity

Image: U.S. Navy Photo by Mass Communication Specialist 3rd Class Jeff Johnstone, Public domain, via Wikimedia Commons

Knowledge distillation

Knowledge distillation transfers knowledge from a large model to a smaller one without loss of validity

Knowledge distillation is a technique used in machine learning to transfer knowledge from a large model to a smaller one. This process allows the smaller model to retain the knowledge capacity of the larger model without the computational expense.

Example

A large neural network trained on a dataset can be distilled into a smaller network that performs nearly as well on the same tasks, making it more efficient for deployment on devices with limited resources.

Knowledge distillation enables the use of smaller, more efficient models while maintaining high performance, which is crucial for deploying machine learning applications on devices with limited computational power.

Related concepts

Prompt engineering

The GenAI model learns tasks from examples in the prompt

Retrieval-augmented generation

RAG enables LLMs to access new information without retraining

Reasoning model

RLMs excel in logic, math, and programming tasks

soft targets carry more information than hard labels: they encode class similarities

Soft targets carry more information than hard labels because they encode class similarities

MoE models have more parameters but similar compute cost

MoE models distribute parameters across k experts, reducing active experts' compute cost

Graduate Aptitude Test in Engineering

GATE exam assesses engineering and science undergraduate subjects for postgraduate admissions in India

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews