Multiview Pseudo-Labeling for Semi-Supervised Learning from Video

Abstract

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video. The complementary views help obtain more reliable "pseudo-labels"" on unlabeled video, to learn stronger video representations than from purely supervised data. Though our method capitalizes on multiple views, it nonetheless trains a model that is shared across appearance and motion input and thus, by design, incurs no additional computation overhead at inference time. On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.

Cite

Text

Xiong et al. "Multiview Pseudo-Labeling for Semi-Supervised Learning from Video." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00712

Markdown

[Xiong et al. "Multiview Pseudo-Labeling for Semi-Supervised Learning from Video." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/xiong2021iccv-multiview/) doi:10.1109/ICCV48922.2021.00712

BibTeX

@inproceedings{xiong2021iccv-multiview,
  title     = {{Multiview Pseudo-Labeling for Semi-Supervised Learning from Video}},
  author    = {Xiong, Bo and Fan, Haoqi and Grauman, Kristen and Feichtenhofer, Christoph},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {7209-7219},
  doi       = {10.1109/ICCV48922.2021.00712},
  url       = {https://mlanthology.org/iccv/2021/xiong2021iccv-multiview/}
}