Progressive Distillation Improves Feature Learning via Implicit Curriculum

Abstract

Knowledge distillation, where a student model learns from a teacher model, is a widely-adopted approach to improve the training of small models. A known challenge in distillation is that a large teacher-student performance gap can hurt the effectiveness of distillation, which prior works have aimed to mitigate by providing intermediate supervision. In this work, we study a popular approach called _progressive distillation_, where several intermediate checkpoints of the teacher are used successively to supervise the student as it learns. Using sparse parity as a testbed, we show empirically and theoretically that these intermediate checkpoints constitute an implicit curriculum that accelerates student learning. This curriculum provides explicit supervision to learn underlying features used in the task, and, importantly, a fully trained teacher does not provide this supervision.

Cite

Text

Panigrahi et al. "Progressive Distillation Improves Feature Learning via Implicit Curriculum." ICML 2024 Workshops: TF2M, 2024.

Markdown

[Panigrahi et al. "Progressive Distillation Improves Feature Learning via Implicit Curriculum." ICML 2024 Workshops: TF2M, 2024.](https://mlanthology.org/icmlw/2024/panigrahi2024icmlw-progressive-a/)

BibTeX

@inproceedings{panigrahi2024icmlw-progressive-a,
  title     = {{Progressive Distillation Improves Feature Learning via Implicit Curriculum}},
  author    = {Panigrahi, Abhishek and Liu, Bingbin and Malladi, Sadhika and Risteski, Andrej and Goel, Surbhi},
  booktitle = {ICML 2024 Workshops: TF2M},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/panigrahi2024icmlw-progressive-a/}
}