Timestamp-Supervised Action Segmentation from the Perspective of Clustering
Abstract
Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which we call ambiguous intervals. To address this issue, we propose a novel framework from the perspective of clustering, which includes the following two parts. First, pseudo-label ensembling generates incomplete but high-quality pseudo-label sequences, where the frames in ambiguous intervals have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.
Cite
Text
Du et al. "Timestamp-Supervised Action Segmentation from the Perspective of Clustering." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/77Markdown
[Du et al. "Timestamp-Supervised Action Segmentation from the Perspective of Clustering." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/du2023ijcai-timestamp/) doi:10.24963/IJCAI.2023/77BibTeX
@inproceedings{du2023ijcai-timestamp,
title = {{Timestamp-Supervised Action Segmentation from the Perspective of Clustering}},
author = {Du, Dazhao and Li, Enhan and Si, Lingyu and Xu, Fanjiang and Sun, Fuchun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {690-698},
doi = {10.24963/IJCAI.2023/77},
url = {https://mlanthology.org/ijcai/2023/du2023ijcai-timestamp/}
}