Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Abstract
Traditional computer vision models are trained to predict a fixed set of predefined categories. Recently, natural language has been shown to be a broader and richer source of supervision that provides finer descriptions to visual concepts than supervised "gold" labels. Previous works, such as CLIP, use a simple pretraining task of predicting the pairings between images and text captions. CLIP, however, is data hungry and requires more than 400M image text pairs for training. We propose a data-efficient contrastive distillation method that uses soft labels to learn from noisy image-text pairs. Our model transfers knowledge from pre-trained image and sentence encoders and achieves strong performance with only 3M image text pairs, 133x smaller than CLIP. Our method exceeds the previous SoTA of general zero-shot learning on ImageNet 21k+1k by 73% relatively with a ResNet50 image encoder and DeCLUTR text encoder. We also beat CLIP by 10.5% relatively on zero-shot evaluation on Google Open Images (19,958 classes).
Cite
Text
Cheng et al. "Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00348Markdown
[Cheng et al. "Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/cheng2021cvprw-dataefficient/) doi:10.1109/CVPRW53098.2021.00348BibTeX
@inproceedings{cheng2021cvprw-dataefficient,
title = {{Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation}},
author = {Cheng, Ruizhe and Wu, Bichen and Zhang, Peizhao and Vajda, Peter and Gonzalez, Joseph E.},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2021},
pages = {3119-3124},
doi = {10.1109/CVPRW53098.2021.00348},
url = {https://mlanthology.org/cvprw/2021/cheng2021cvprw-dataefficient/}
}