Self-Boosting for Feature Distillation

Abstract

Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.

Cite

Text

Pei et al. "Self-Boosting for Feature Distillation." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/131

Markdown

[Pei et al. "Self-Boosting for Feature Distillation." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/pei2021ijcai-self/) doi:10.24963/IJCAI.2021/131

BibTeX

@inproceedings{pei2021ijcai-self,
  title     = {{Self-Boosting for Feature Distillation}},
  author    = {Pei, Yulong and Qu, Yanyun and Zhang, Junping},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {945-951},
  doi       = {10.24963/IJCAI.2021/131},
  url       = {https://mlanthology.org/ijcai/2021/pei2021ijcai-self/}
}