Improving Knowledge Distillation with Teacher's Explanation
Abstract
Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This limits the amount of transferred knowledge. In this work, we introduce a novel Knowledge Explaining Distillation (KED) framework, which allows the student to learn not only from the teacher's predictions but also from the teacher's explanations. We propose a class of superfeature-explaining teachers that provide explanation over groups of features, along with the corresponding student model. We also present a method for constructing the superfeatures. We then extend KED to reduce complexity in convolutional neural networks, to allow augmentation with hidden-representation distillation methods, and to work with a limited amount of training data using chimeric sets. Our experiments over a variety of datasets show that KED students can substantially outperform KD students of similar complexity.
Cite
Text
Chowdhury et al. "Improving Knowledge Distillation with Teacher's Explanation." NeurIPS 2024 Workshops: Compression, 2024.Markdown
[Chowdhury et al. "Improving Knowledge Distillation with Teacher's Explanation." NeurIPS 2024 Workshops: Compression, 2024.](https://mlanthology.org/neuripsw/2024/chowdhury2024neuripsw-improving/)BibTeX
@inproceedings{chowdhury2024neuripsw-improving,
title = {{Improving Knowledge Distillation with Teacher's Explanation}},
author = {Chowdhury, Sayantan and Liang, Ben and Tizghadam, Ali and Albanese, Ilijc},
booktitle = {NeurIPS 2024 Workshops: Compression},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/chowdhury2024neuripsw-improving/}
}