Bootstrapped Representation Learning for Skeleton-Based Action Recognition
Abstract
In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60, NTU-120 and PKU-MMD datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on linear evaluation, semi-supervised and transfer learning benchmarks.
Cite
Text
Moliner et al. "Bootstrapped Representation Learning for Skeleton-Based Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00460Markdown
[Moliner et al. "Bootstrapped Representation Learning for Skeleton-Based Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/moliner2022cvprw-bootstrapped/) doi:10.1109/CVPRW56347.2022.00460BibTeX
@inproceedings{moliner2022cvprw-bootstrapped,
title = {{Bootstrapped Representation Learning for Skeleton-Based Action Recognition}},
author = {Moliner, Olivier and Huang, Sangxia and Åström, Kalle},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2022},
pages = {4153-4163},
doi = {10.1109/CVPRW56347.2022.00460},
url = {https://mlanthology.org/cvprw/2022/moliner2022cvprw-bootstrapped/}
}