3D Human Action Representation Learning via Cross-View Consistency Pursuit

Abstract

In this work, we propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action representation (CrosSCLR), by leveraging multi-view complementary supervision signal. CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner. It is noted that CVC-KM works in such a way that high-confidence positive/negative samples and their distributions are exchanged among views according to their embedding similarity, ensuring cross-view consistency in terms of contrastive context, i.e., similar distributions. Extensive experiments show that CrosSCLR achieves remarkable action recognition results on NTU-60 and NTU-120 datasets under unsupervised settings, with observed higher-quality action representations. Our code is available at https://github.com/LinguoLi/CrosSCLR.

Cite

Text

Li et al. "3D Human Action Representation Learning via Cross-View Consistency Pursuit." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00471

Markdown

[Li et al. "3D Human Action Representation Learning via Cross-View Consistency Pursuit." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/li2021cvpr-3d/) doi:10.1109/CVPR46437.2021.00471

BibTeX

@inproceedings{li2021cvpr-3d,
  title     = {{3D Human Action Representation Learning via Cross-View Consistency Pursuit}},
  author    = {Li, Linguo and Wang, Minsi and Ni, Bingbing and Wang, Hang and Yang, Jiancheng and Zhang, Wenjun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {4741-4750},
  doi       = {10.1109/CVPR46437.2021.00471},
  url       = {https://mlanthology.org/cvpr/2021/li2021cvpr-3d/}
}