Scaffolding a Student to Instill Knowledge
Abstract
We propose a novel knowledge distillation (KD) method to selectively instill teacher knowledge into a student model motivated by situations where the student's capacity is significantly smaller than that of the teachers. In vanilla KD, the teacher primarily sets a predictive target for the student to follow, and we posit that this target is overly optimistic due to the student's lack of capacity. We develop a novel scaffolding scheme where the teacher, in addition to setting a predictive target, also scaffolds the student's prediction by censoring hard-to-learn examples. Scaffolding utilizes the same information as the teacher's soft-max predictions as inputs, and in this sense, our proposal can be viewed as a natural variant of vanilla KD. We show on synthetic examples that censoring hard-examples leads to smoothening the student's loss landscape so that the student encounters fewer local minima. As a result, it has good generalization properties. Against vanilla KD, we achieve improved performance and are comparable to more intrusive techniques that leverage feature matching on benchmark datasets.
Cite
Text
Kag et al. "Scaffolding a Student to Instill Knowledge." International Conference on Learning Representations, 2023.Markdown
[Kag et al. "Scaffolding a Student to Instill Knowledge." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/kag2023iclr-scaffolding/)BibTeX
@inproceedings{kag2023iclr-scaffolding,
title = {{Scaffolding a Student to Instill Knowledge}},
author = {Kag, Anil and Acar, Durmus Alp Emre and Gangrade, Aditya and Saligrama, Venkatesh},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/kag2023iclr-scaffolding/}
}