Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
Abstract
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes suboptimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. We first design a region-based pretext task which requires the model to transform instance representations from one view to another, guided by context features. Further, we introduce a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations. We evaluate our learned representations on a variety of downstream tasks and show that ConST-CL achieves competitive results on 6 datasets, including Kinetics, UCF, HMDB, AVAKinetics, AVA and OTB. Our code and models will be available.
Cite
Text
Yuan et al. "Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01359Markdown
[Yuan et al. "Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/yuan2022cvpr-contextualized/) doi:10.1109/CVPR52688.2022.01359BibTeX
@inproceedings{yuan2022cvpr-contextualized,
title = {{Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision}},
author = {Yuan, Liangzhe and Qian, Rui and Cui, Yin and Gong, Boqing and Schroff, Florian and Yang, Ming-Hsuan and Adam, Hartwig and Liu, Ting},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {13977-13986},
doi = {10.1109/CVPR52688.2022.01359},
url = {https://mlanthology.org/cvpr/2022/yuan2022cvpr-contextualized/}
}