Space-Time Correspondence as a Contrastive Random Walk
Abstract
This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, such that prediction of long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.
Cite
Text
Jabri et al. "Space-Time Correspondence as a Contrastive Random Walk." Neural Information Processing Systems, 2020.Markdown
[Jabri et al. "Space-Time Correspondence as a Contrastive Random Walk." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/jabri2020neurips-spacetime/)BibTeX
@inproceedings{jabri2020neurips-spacetime,
title = {{Space-Time Correspondence as a Contrastive Random Walk}},
author = {Jabri, Allan and Owens, Andrew and Efros, Alexei},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/jabri2020neurips-spacetime/}
}