With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

Abstract

Self-supervised learning algorithms based on instance discrimination train encoders to be invariant to pre-defined transformations of the same instance. While most methods treat different views of the same image as positives for a contrastive loss, we are interested in using positives from other instances in the dataset. Our method, Nearest-Neighbor Contrastive Learning of visual Representations (NNCLR), samples the nearest neighbors from the dataset in the latent space, and treats them as positives. This provides more semantic variations than pre-defined transformations. We find that using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification, from 71.7% to 75.6%, outperforming previous state-of-the-art methods. On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53.8% to 56.5%. On transfer learning benchmarks our method outperforms state-of-the-art methods (including supervised learning with ImageNet) on 8 out of 12 downstream datasets. Furthermore, we demonstrate empirically that our method is less reliant on complex data augmentations. We see a relative reduction of only 2.1% ImageNet Top-1 accuracy when we train using only random crops.

Cite

Text

Dwibedi et al. "With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00945

Markdown

[Dwibedi et al. "With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/dwibedi2021iccv-little/) doi:10.1109/ICCV48922.2021.00945

BibTeX

@inproceedings{dwibedi2021iccv-little,
  title     = {{With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations}},
  author    = {Dwibedi, Debidatta and Aytar, Yusuf and Tompson, Jonathan and Sermanet, Pierre and Zisserman, Andrew},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {9588-9597},
  doi       = {10.1109/ICCV48922.2021.00945},
  url       = {https://mlanthology.org/iccv/2021/dwibedi2021iccv-little/}
}