Pixel-Level Correspondence for Self-Supervised Learning from Video

Abstract

While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.

Cite

Text

Sharma et al. "Pixel-Level Correspondence for Self-Supervised Learning from Video." ICML 2022 Workshops: Pre-Training, 2022.

Markdown

[Sharma et al. "Pixel-Level Correspondence for Self-Supervised Learning from Video." ICML 2022 Workshops: Pre-Training, 2022.](https://mlanthology.org/icmlw/2022/sharma2022icmlw-pixellevel/)

BibTeX

@inproceedings{sharma2022icmlw-pixellevel,
  title     = {{Pixel-Level Correspondence for Self-Supervised Learning from Video}},
  author    = {Sharma, Yash and Zhu, Yi and Russell, Chris and Brox, Thomas},
  booktitle = {ICML 2022 Workshops: Pre-Training},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/sharma2022icmlw-pixellevel/}
}