DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-Based Human Action Segmentation

Abstract

In this paper, a contrastive representation learning framework is proposed to enhance human action segmentation via pre-training using trimmed (single action) skeleton sequences. Unlike previous representation learning works that are tailored for action recognition and that build upon isolated sequence-wise representations, the proposed framework focuses on exploiting multi-scale representations in conjunction with cross-sequence variations. More specifically, it proposes a novel data augmentation strategy, "Shuffle and Warp", which exploits diverse multi-action permutations. The latter effectively assists two surrogate tasks that are introduced in contrastive learning: Cross Permutation Contrasting (CPC) and Relative Order Reasoning (ROR). In optimization, CPC learns intra-class similarities by contrasting representations of the same action class across different permutations, while ROR reasons about inter-class contexts by predicting relative mapping between two permutations. Together, these tasks enable a Dual-Surrogate Contrastive Learning (DuoCLR) network to learn multi-scale feature representations optimized for action segmentation. In experiments, DuoCLR is pre-trained on a trimmed skeleton dataset and evaluated on an untrimmed dataset where it demonstrates a significant boost over state-the-art comparatives in both multi-class and multi-label action segmentation tasks. Lastly, ablation studies are conducted to evaluate the effectiveness of each component of the proposed approach.

Cite

Text

Tian. "DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-Based Human Action Segmentation." International Conference on Computer Vision, 2025.

Markdown

[Tian. "DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-Based Human Action Segmentation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/tian2025iccv-duoclr/)

BibTeX

@inproceedings{tian2025iccv-duoclr,
  title     = {{DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-Based Human Action Segmentation}},
  author    = {Tian, Haitao},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {13772-13782},
  url       = {https://mlanthology.org/iccv/2025/tian2025iccv-duoclr/}
}