Look Globally and Locally: Inter-Intra Contrastive Learning from Unlabeled Videos
Abstract
State-of-the-art video contrastive learning methods spatiotemporally augment two clips from the same video as positives. By only sampling positive clips from the same video, these methods neglect other semantically related videos that can also be useful. To address this limitation, we leverage nearest-neighbor videos from the global space as additional positives, thus improving diversity and introducing a more relaxed notion of similarity that extends beyond video and even class boundaries. Our Inter-Intra Video Contrastive Learning (IIVCL) improves performance and generalization on video classification, detection, and retrieval tasks.
Cite
Text
Fan et al. "Look Globally and Locally: Inter-Intra Contrastive Learning from Unlabeled Videos." ICLR 2023 Workshops: ME-FoMo, 2023.Markdown
[Fan et al. "Look Globally and Locally: Inter-Intra Contrastive Learning from Unlabeled Videos." ICLR 2023 Workshops: ME-FoMo, 2023.](https://mlanthology.org/iclrw/2023/fan2023iclrw-look/)BibTeX
@inproceedings{fan2023iclrw-look,
title = {{Look Globally and Locally: Inter-Intra Contrastive Learning from Unlabeled Videos}},
author = {Fan, David and Yang, Deyu and Li, Xinyu and Bhat, Vimal and Mv, Rohith},
booktitle = {ICLR 2023 Workshops: ME-FoMo},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/fan2023iclrw-look/}
}