SEED: Self-Supervised Distillation for Visual Representation

Abstract

This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named $\textbf{SE}$lf-Sup$\textbf{E}$rvised $\textbf{D}$istillation (${\large S}$EED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that ${\large S}$EED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, ${\large S}$EED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNet-v3-Large on the ImageNet-1k dataset.

Cite

Text

Fang et al. "SEED: Self-Supervised Distillation for Visual Representation." International Conference on Learning Representations, 2021.

Markdown

[Fang et al. "SEED: Self-Supervised Distillation for Visual Representation." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/fang2021iclr-seed/)

BibTeX

@inproceedings{fang2021iclr-seed,
  title     = {{SEED: Self-Supervised Distillation for Visual Representation}},
  author    = {Fang, Zhiyuan and Wang, Jianfeng and Wang, Lijuan and Zhang, Lei and Yang, Yezhou and Liu, Zicheng},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/fang2021iclr-seed/}
}