Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition
Abstract
Few-shot action recognition aims to recognize few-labeled novel action classes and attracts growing attentions due to practical significance. Human skeletons provide explainable and data-efficient representation for this problem by explicitly modeling spatial-temporal relations among skeleton joints. However, existing skeleton-based spatial-temporal models tend to deteriorate the positional distinguishability of joints, which leads to fuzzy spatial matching and poor explainability. To address these issues, we propose a novel spatial matching strategy consisting of spatial disentanglement and spatial activation. The motivation behind spatial disentanglement is that we find more spatial information for leaf nodes (e.g., the “hand” joint ) is beneficial to increase representation diversity for skeleton matching. To achieve spatial disentanglement, we encourage the skeletons to be represented in a full rank space with rank maximization constraint. Finally, an attention based spatial activation mechanism is introduced to incorporate the disentanglement, by adaptively adjusting the disentangled joints according to matching pairs. Extensive experiments on three skeleton benchmarks demonstrate that the proposed spatial matching strategy can be effectively inserted into existing temporal alignment frameworks, achieving considerable performance improvements as well as inherent explainability.
Cite
Text
Ma et al. "Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19772-7_11Markdown
[Ma et al. "Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/ma2022eccv-learning/) doi:10.1007/978-3-031-19772-7_11BibTeX
@inproceedings{ma2022eccv-learning,
title = {{Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition}},
author = {Ma, Ning and Zhang, Hongyi and Li, Xuhui and Zhou, Sheng and Zhang, Zhen and Wen, Jun and Li, Haifeng and Gu, Jingjun and Bu, Jiajun},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19772-7_11},
url = {https://mlanthology.org/eccv/2022/ma2022eccv-learning/}
}