Feature Normalized Knowledge Distillation for Image Classification
Abstract
Knowledge Distillation (KD) transfers the knowledge from a cumbersome teacher model to a lightweight student network. Since a single image may reasonably relate to several categories, the one-hot label would inevitably introduce the encoding noise. From this perspective, we systematically analyze the distillation mechanism and demonstrate that the L2-norm of the feature in penultimate layer would be too large under the influence of label noise, and the temperature T in KD could be regarded as a correction factor for L2-norm to suppress the impact of noise. Noticing different samples suffer from varying intensities of label noise, we further propose a simple yet effective feature normalized knowledge distillation which introduces the sample specific correction factor to replace the unified temperature T for better reducing the impact of noise. Extensive experiments show that the proposed method surpasses standard KD as well as self-distillation significantly on Cifar-100, CUB-200-2011 and Stanford Cars datasets. The codes are in https://github.com/aztc/FNKD
Cite
Text
Xu et al. "Feature Normalized Knowledge Distillation for Image Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58595-2_40Markdown
[Xu et al. "Feature Normalized Knowledge Distillation for Image Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/xu2020eccv-feature/) doi:10.1007/978-3-030-58595-2_40BibTeX
@inproceedings{xu2020eccv-feature,
title = {{Feature Normalized Knowledge Distillation for Image Classification}},
author = {Xu, Kunran and Rui, Lai and Li, Yishi and Gu, Lin},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58595-2_40},
url = {https://mlanthology.org/eccv/2020/xu2020eccv-feature/}
}