Learning Correspondence from the Cycle-Consistency of Time

Abstract

We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

Cite

Text

Wang et al. "Learning Correspondence from the Cycle-Consistency of Time." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00267

Markdown

[Wang et al. "Learning Correspondence from the Cycle-Consistency of Time." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/wang2019cvpr-learning-b/) doi:10.1109/CVPR.2019.00267

BibTeX

@inproceedings{wang2019cvpr-learning-b,
  title     = {{Learning Correspondence from the Cycle-Consistency of Time}},
  author    = {Wang, Xiaolong and Jabri, Allan and Efros, Alexei A.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00267},
  url       = {https://mlanthology.org/cvpr/2019/wang2019cvpr-learning-b/}
}