Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation

Abstract

Knowledge distillation (KD) is a core component in the training and deployment of modern generative models, particularly large language models (LLMs). While its empirical benefits are well documented---enabling smaller student models to emulate the performance of much larger teachers---the underlying mechanisms by which KD improves generative quality remain poorly understood. In this work, we present a minimal working explanation of KD in generative modeling. Using a controlled simulation with mixtures of Gaussians, we demonstrate that distillation induces a trade-off between precision and recall in the student model. As the teacher distribution becomes more selective, the student concentrates more probability mass on high-likelihood regions at the expense of coverage, which is a behavior modulated by a single entropy-controlling parameter. We then validate this effect in a large-scale language modeling setup using the SmolLM2 family of models. Empirical results reveal the same precision-recall dynamics observed in simulation, where precision corresponds to sample quality and recall to distributional coverage. This precision-recall trade-off in LLMs is found to be especially beneficial in scenarios where sample quality is more important than diversity, such as instruction tuning or downstream generation. Our analysis provides a simple and general explanation for the effectiveness of KD in generative modeling.

Cite

Text

Cha and Cho. "Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Cha and Cho. "Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/cha2025neurips-knowledge/)

BibTeX

@inproceedings{cha2025neurips-knowledge,
  title     = {{Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation}},
  author    = {Cha, Sungmin and Cho, Kyunghyun},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/cha2025neurips-knowledge/}
}