Semi-Supervised Vision Transformers at Scale

Cai, Zhaowei; Ravichandran, Avinash; Favaro, Paolo; Wang, Manchen; Modolo, Davide; Bhotika, Rahul; Tu, Zhuowen; Soatto, Stefano

Semi-Supervised Vision Transformers at Scale

Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

NeurIPS 2022

/neurips/2022/cai2022neurips-semisupervised/

Abstract

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we use a SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracy. For example, Semi-ViT-Huge achieves an impressive 80\% top-1 accuracy on ImageNet using only 1\% labels, which is comparable with Inception-v4 using 100\% ImageNet labels. The code is available at https://github.com/amazon-science/semi-vit.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Cai et al. "Semi-Supervised Vision Transformers at Scale." Neural Information Processing Systems, 2022.

Markdown

[Cai et al. "Semi-Supervised Vision Transformers at Scale." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/cai2022neurips-semisupervised/)

BibTeX

@inproceedings{cai2022neurips-semisupervised,
  title     = {{Semi-Supervised Vision Transformers at Scale}},
  author    = {Cai, Zhaowei and Ravichandran, Avinash and Favaro, Paolo and Wang, Manchen and Modolo, Davide and Bhotika, Rahul and Tu, Zhuowen and Soatto, Stefano},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/cai2022neurips-semisupervised/}
}