Feature Normalized Knowledge Distillation for Image Classification

Abstract

Knowledge Distillation (KD) transfers the knowledge from a cumbersome teacher model to a lightweight student network. Since a single image may reasonably relate to several categories, the one-hot label would inevitably introduce the encoding noise. From this perspective, we systematically analyze the distillation mechanism and demonstrate that the L2-norm of the feature in penultimate layer would be too large under the influence of label noise, and the temperature T in KD could be regarded as a correction factor for L2-norm to suppress the impact of noise. Noticing different samples suffer from varying intensities of label noise, we further propose a simple yet effective feature normalized knowledge distillation which introduces the sample specific correction factor to replace the unified temperature T for better reducing the impact of noise. Extensive experiments show that the proposed method surpasses standard KD as well as self-distillation significantly on Cifar-100, CUB-200-2011 and Stanford Cars datasets. The codes are in https://github.com/aztc/FNKD

Cite

Text

Xu et al. "Feature Normalized Knowledge Distillation for Image Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58595-2_40

Markdown

[Xu et al. "Feature Normalized Knowledge Distillation for Image Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/xu2020eccv-feature/) doi:10.1007/978-3-030-58595-2_40

BibTeX

@inproceedings{xu2020eccv-feature,
  title     = {{Feature Normalized Knowledge Distillation for Image Classification}},
  author    = {Xu, Kunran and Rui, Lai and Li, Yishi and Gu, Lin},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58595-2_40},
  url       = {https://mlanthology.org/eccv/2020/xu2020eccv-feature/}
}