USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

Abstract

Contrastive learning has achieved great success in skeleton-based representation learning recently. However, the prevailing methods are predominantly negative-based, necessitating additional momentum encoder and memory bank to get negative samples, which increases the difficulty of model training. Furthermore, these methods primarily concentrate on learning a global representation for recognition and retrieval tasks, while overlooking the rich and detailed local representations that are crucial for dense prediction tasks. To alleviate these issues, we introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation, called USDRL, which employs feature decorrelation across temporal, spatial, and instance domains in a multi-grained manner to reduce redundancy among dimensions of the representations to maximize information extraction from features. Additionally, we design a Dense Spatio-Temporal Encoder (DSTE) to capture fine-grained action representations effectively, thereby enhancing the performance of dense prediction tasks. Comprehensive experiments, conducted on the benchmarks NTU-60, NTU-120, PKU-MMD I, and PKU-MMD II, across diverse downstream tasks including action recognition, action retrieval, and action detection, conclusively demonstrate that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.

Cite

Text

Weng et al. "USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32899

Markdown

[Weng et al. "USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/weng2025aaai-usdrl/) doi:10.1609/AAAI.V39I8.32899

BibTeX

@inproceedings{weng2025aaai-usdrl,
  title     = {{USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation}},
  author    = {Weng, Wanjiang and Wang, Hongsong and Wang, Junbo and He, Lei and Xie, Guo-Sen},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8332-8340},
  doi       = {10.1609/AAAI.V39I8.32899},
  url       = {https://mlanthology.org/aaai/2025/weng2025aaai-usdrl/}
}