SLAM: Student-Label Mixing for Distillation with Unlabeled Examples

Abstract

Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates "soft" pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called ``forward loss-adjustment" methods.

Cite

Text

Kontonis et al. "SLAM: Student-Label Mixing for  Distillation with Unlabeled Examples." Neural Information Processing Systems, 2023.

Markdown

[Kontonis et al. "SLAM: Student-Label Mixing for  Distillation with Unlabeled Examples." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/kontonis2023neurips-slam/)

BibTeX

@inproceedings{kontonis2023neurips-slam,
  title     = {{SLAM: Student-Label Mixing for  Distillation with Unlabeled Examples}},
  author    = {Kontonis, Vasilis and Iliopoulos, Fotis and Trinh, Khoa and Baykal, Cenk and Menghani, Gaurav and Vee, Erik},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/kontonis2023neurips-slam/}
}