SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition

Wu, Cong; Wu, Xiao-Jun; Kittler, Josef; Xu, Tianyang; Ahmed, Sara; Awais, Muhammad; Feng, Zhenhua

doi:10.1609/AAAI.V38I6.28409

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition

Cong Wu, Xiao-Jun Wu, Josef Kittler, Tianyang Xu, Sara Ahmed, Muhammad Awais, Zhenhua Feng

AAAI 2024 pp. 5949-5957

doi:10.1609/AAAI.V38I6.28409 /aaai/2024/wu2024aaai-scd/

Abstract

Contrastive learning has achieved great success in skeleton-based action recognition. However, most existing approaches encode the skeleton sequences as entangled spatiotemporal representations and confine the contrasts to the same level of representation. Instead, this paper introduces a novel contrastive learning framework, namely Spatiotemporal Clues Disentanglement Network (SCD-Net). Specifically, we integrate the decoupling module with a feature extractor to derive explicit clues from spatial and temporal domains respectively. As for the training of SCD-Net, with a constructed global anchor, we encourage the interaction between the anchor and extracted clues. Further, we propose a new masking strategy with structural constraints to strengthen the contextual associations, leveraging the latest development from masked image modelling into the proposed SCD-Net. We conduct extensive evaluations on the NTU-RGB+D (60&120) and PKU-MMD (I&II) datasets, covering various downstream tasks such as action recognition, action retrieval, transfer learning, and semi-supervised learning. The experimental results demonstrate the effectiveness of our method, which outperforms the existing state-of-the-art (SOTA) approaches significantly. Our code and supplementary material can be found at https://github.com/cong-wu/SCD-Net.

PDF AAAI Semantic Scholar

Cite

Text

Wu et al. "SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I6.28409

Markdown

[Wu et al. "SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/wu2024aaai-scd/) doi:10.1609/AAAI.V38I6.28409

BibTeX

@inproceedings{wu2024aaai-scd,
  title     = {{SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition}},
  author    = {Wu, Cong and Wu, Xiao-Jun and Kittler, Josef and Xu, Tianyang and Ahmed, Sara and Awais, Muhammad and Feng, Zhenhua},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {5949-5957},
  doi       = {10.1609/AAAI.V38I6.28409},
  url       = {https://mlanthology.org/aaai/2024/wu2024aaai-scd/}
}