Self-Supervised Motion Learning from Static Images

Abstract

Motions are reflected in videos as the movement of pixels, and actions are essentially patterns of inconsistent motions between the foreground and the background. To well distinguish the actions, especially those with complicated spatio-temporal interactions, correctly locating the prominent motion areas is of crucial importance. However, most motion information in existing videos are difficult to label and training a model with good motion representations with supervision will thus require a large amount of human labour for annotation. In this paper, we address this problem by self-supervised learning. Specifically, we propose to learn Motion from Static Images (MoSI). The model learns to encode motion information by classifying pseudo motions generated by MoSI. We furthermore introduce a static mask in pseudo motions to create local motion patterns, which forces the model to additionally locate notable motion areas for the correct classification.We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets. As a result, the learned motion representations boost the performance of tasks requiring understanding of complex scenes and motions, i.e., action recognition. Extensive experiments show the consistent and transferable improvements achieved by MoSI. Codes will be soon released.

Cite

Text

Huang et al. "Self-Supervised Motion Learning from Static Images." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00133

Markdown

[Huang et al. "Self-Supervised Motion Learning from Static Images." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/huang2021cvpr-selfsupervised/) doi:10.1109/CVPR46437.2021.00133

BibTeX

@inproceedings{huang2021cvpr-selfsupervised,
  title     = {{Self-Supervised Motion Learning from Static Images}},
  author    = {Huang, Ziyuan and Zhang, Shiwei and Jiang, Jianwen and Tang, Mingqian and Jin, Rong and Ang, Marcelo H.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {1276-1285},
  doi       = {10.1109/CVPR46437.2021.00133},
  url       = {https://mlanthology.org/cvpr/2021/huang2021cvpr-selfsupervised/}
}