Masked Autoencoders Are Stronger Knowledge Distillers

Abstract

Knowledge distillation (KD) has shown great success in improving student's performance by mimicking the intermediate output of the high-capacity teacher in fine-grained visual tasks, e.g. object detection. This paper proposes a technique called Masked Knowledge Distillation (MKD) that enhances this process using a masked autoencoding scheme. In MKD, random patches of the input image are masked, and the corresponding missing feature is recovered by forcing it to imitate the output of the teacher. MKD is based on two core designs. First, using the student as the encoder, we develop an adaptive decoder architecture, which includes a spatial alignment module that operates on the multi-scale features in the feature pyramid network (FPN), a simple decoder, and a spatial recovery module that mimics the teacher's output from the latent representation and mask tokens. Second, we introduce the masked convolution in each convolution block to keep the masked patches unaffected by others. By coupling these two designs, we can further improve the completeness and effectiveness of teacher knowledge learning. We conduct extensive experiments on different architectures with object detection and semantic segmentation. The results show that all the students can achieve further improvements compared to the conventional KD. Notably, we establish the new state-of-the-art results by boosting RetinaNet ResNet-18, and ResNet-50 from 33.4 to 37.5 mAP, and 37.4 to 41.5 mAP, respectively.

Cite

Text

Lao et al. "Masked Autoencoders Are Stronger Knowledge Distillers." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00587

Markdown

[Lao et al. "Masked Autoencoders Are Stronger Knowledge Distillers." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/lao2023iccv-masked/) doi:10.1109/ICCV51070.2023.00587

BibTeX

@inproceedings{lao2023iccv-masked,
  title     = {{Masked Autoencoders Are Stronger Knowledge Distillers}},
  author    = {Lao, Shanshan and Song, Guanglu and Liu, Boxiao and Liu, Yu and Yang, Yujiu},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {6384-6393},
  doi       = {10.1109/ICCV51070.2023.00587},
  url       = {https://mlanthology.org/iccv/2023/lao2023iccv-masked/}
}