Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos
Abstract
Existing skeleton-based human action classification models rely on well-trimmed action-specific skeleton videos for both training and testing, precluding their scalability to real-world applications where untrimmed videos exhibiting concatenated actions are predominant. To overcome this limitation, recently introduced skeleton action segmentation models involve un-trimmed skeleton videos into end-to-end training. The model is optimized to provide frame-wise predictions for any length of testing videos, simultaneously realizing action localization and classification. Yet, achieving such an improvement im-poses frame-wise annotated skeleton videos, which remains time-consuming in practice. This paper features a novel framework for skeleton-based action segmentation trained on short trimmed skeleton videos, but that can run on longer un-trimmed videos. The approach is implemented in three steps: Stitch, Contrast, and Segment. First, Stitch proposes a tem-poral skeleton stitching scheme that treats trimmed skeleton videos as elementary human motions that compose a semantic space and can be sampled to generate multi-action stitched se-quences. Contrast learns contrastive representations from stitched sequences with a novel discrimination pretext task that enables a skeleton encoder to learn meaningful action-temporal contexts to improve action segmentation. Finally, Segment relates the proposed method to action segmentation by learning a segmentation layer while handling particular da-ta availability. Experiments involve a trimmed source dataset and an untrimmed target dataset in an adaptation formulation for real-world skeleton-based human action segmentation to evaluate the effectiveness of the proposed method.
Cite
Text
Tian and Payeur. "Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32792Markdown
[Tian and Payeur. "Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/tian2025aaai-stitch/) doi:10.1609/AAAI.V39I7.32792BibTeX
@inproceedings{tian2025aaai-stitch,
title = {{Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos}},
author = {Tian, Haitao and Payeur, Pierre},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {7365-7373},
doi = {10.1609/AAAI.V39I7.32792},
url = {https://mlanthology.org/aaai/2025/tian2025aaai-stitch/}
}