Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos

Tian, Haitao; Payeur, Pierre

doi:10.1609/AAAI.V39I7.32792

Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos

Haitao Tian, Pierre Payeur

AAAI 2025 pp. 7365-7373

doi:10.1609/AAAI.V39I7.32792 /aaai/2025/tian2025aaai-stitch/

Abstract

Existing skeleton-based human action classification models rely on well-trimmed action-specific skeleton videos for both training and testing, precluding their scalability to real-world applications where untrimmed videos exhibiting concatenated actions are predominant. To overcome this limitation, recently introduced skeleton action segmentation models involve un-trimmed skeleton videos into end-to-end training. The model is optimized to provide frame-wise predictions for any length of testing videos, simultaneously realizing action localization and classification. Yet, achieving such an improvement im-poses frame-wise annotated skeleton videos, which remains time-consuming in practice. This paper features a novel framework for skeleton-based action segmentation trained on short trimmed skeleton videos, but that can run on longer un-trimmed videos. The approach is implemented in three steps: Stitch, Contrast, and Segment. First, Stitch proposes a tem-poral skeleton stitching scheme that treats trimmed skeleton videos as elementary human motions that compose a semantic space and can be sampled to generate multi-action stitched se-quences. Contrast learns contrastive representations from stitched sequences with a novel discrimination pretext task that enables a skeleton encoder to learn meaningful action-temporal contexts to improve action segmentation. Finally, Segment relates the proposed method to action segmentation by learning a segmentation layer while handling particular da-ta availability. Experiments involve a trimmed source dataset and an untrimmed target dataset in an adaptation formulation for real-world skeleton-based human action segmentation to evaluate the effectiveness of the proposed method.

PDF AAAI Semantic Scholar

Cite

Text

Tian and Payeur. "Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32792

Markdown

[Tian and Payeur. "Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/tian2025aaai-stitch/) doi:10.1609/AAAI.V39I7.32792

BibTeX

@inproceedings{tian2025aaai-stitch,
  title     = {{Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos}},
  author    = {Tian, Haitao and Payeur, Pierre},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {7365-7373},
  doi       = {10.1609/AAAI.V39I7.32792},
  url       = {https://mlanthology.org/aaai/2025/tian2025aaai-stitch/}
}