Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models

Abstract

Human spatial attention conveys information about theregions of visual scenes that are important for perform-ing visual tasks. Prior work has shown that the informa-tion about human attention can be leveraged to benefit var-ious supervised vision tasks. Might providing this weakform of supervision be useful for self-supervised represen-tation learning? Addressing this question requires collect-ing large datasets with human attention labels. Yet, col-lecting such large scale data is very expensive. To addressthis challenge, we construct an auxiliary teacher model topredict human attention, trained on a relatively small la-beled dataset. This teacher model allows us to generate im-age (pseudo) attention labels for ImageNet. We then traina model with a primary contrastive objective; to this stan-dard configuration, we add a simple output head trained topredict the attentional map for each image, guided by thepseudo labels from teacher model. We measure the qual-ity of learned representations by evaluating classificationperformance from the frozen learned embeddings as wellas performance on image retrieval tasks. We find that thespatial-attention maps predicted from the contrastive modeltrained with teacher guidance aligns better with human at-tention compared to vanilla contrastive models. Moreover,we find that our approach improves classification accuracyand robustness of the contrastive models on ImageNet andImageNet-C. Further, we find that model representationsbecome more useful for image retrieval task as measuredby precision-recall performance on ImageNet, ImageNet-C,CIFAR10, and CIFAR10-C datasets.

Cite

Text

Yao et al. "Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.02230

Markdown

[Yao et al. "Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/yao2023cvpr-teachergenerated/) doi:10.1109/CVPR52729.2023.02230

BibTeX

@inproceedings{yao2023cvpr-teachergenerated,
  title     = {{Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models}},
  author    = {Yao, Yushi and Ye, Chang and He, Junfeng and Elsayed, Gamaleldin F.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {23282-23291},
  doi       = {10.1109/CVPR52729.2023.02230},
  url       = {https://mlanthology.org/cvpr/2023/yao2023cvpr-teachergenerated/}
}