DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Abstract

Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. Our extensive experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, including instruction-following and code generation, but also supports diverse applications, such as preference alignment and vision-language extensions. These findings highlight the potential of a contrastive approach to enhance the efficacy of LLM distillation by effectively aligning teacher and student models across varied data types.

Cite

Text

Ko et al. "DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Ko et al. "DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ko2025icml-distillm2/)

BibTeX

@inproceedings{ko2025icml-distillm2,
  title     = {{DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs}},
  author    = {Ko, Jongwoo and Chen, Tianyi and Kim, Sungnyun and Ding, Tianyu and Liang, Luming and Zharkov, Ilya and Yun, Se-Young},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {31044-31062},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/ko2025icml-distillm2/}
}