Learn from One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation

Abstract

Knowledge Distillation is known as an effective technique to compress over-parameterized language models. In this work, we propose to break down the global feature distillation task into N local sub-tasks. In this new framework, we consider each neuron in the last hidden layer of the teacher network as a specialized sub-teacher. We also consider each neuron in the last hidden layer of the student network as a focused sub-student. We make each focused sub-student learn from one corresponding specialized sub-teacher and ignore the others. This will facilitate the task for the sub-student and keep him focused. This method is novel and can be combined with other distillation techniques. Empirical results show that our proposed approach outperforms the state-of-the-art methods by maintaining higher performance on most benchmark datasets.

Cite

Text

Saadi et al. "Learn from One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation." ICML 2023 Workshops: NCW, 2023.

Markdown

[Saadi et al. "Learn from One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation." ICML 2023 Workshops: NCW, 2023.](https://mlanthology.org/icmlw/2023/saadi2023icmlw-learn/)

BibTeX

@inproceedings{saadi2023icmlw-learn,
  title     = {{Learn from One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation}},
  author    = {Saadi, Khouloud and Mitrović, Jelena and Granitzer, Michael},
  booktitle = {ICML 2023 Workshops: NCW},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/saadi2023icmlw-learn/}
}