Random Walks for Temporal Action Segmentation with Timestamp Supervision

Abstract

Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense video annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically are not well-defined, leading to inherent ambiguity and inter-rater disagreement. A promising approach to remedy these limitations is timestamp supervision, requiring only one labeled frame per action instance in a training video. In this work, we reformulate the task of temporal segmentation as a graph segmentation problem with weakly-labeled vertices. We introduce an efficient segmentation method based on random walks on graphs, obtained by solving a sparse system of linear equations. Furthermore, the proposed technique can be employed in any one or combination of the following forms: (1) as a standalone solution for generating dense pseudo-labels from timestamps; (2) as a training loss; (3) as a smoothing mechanism given intermediate predictions. Extensive experiments with three datasets (50Salads, Breakfast, GTEA) show that our method competes with state-of-the-art, and allows the identification of regions of uncertainty around action boundaries.

Cite

Text

Hirsch et al. "Random Walks for Temporal Action Segmentation with Timestamp Supervision." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Hirsch et al. "Random Walks for Temporal Action Segmentation with Timestamp Supervision." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/hirsch2024wacv-random/)

BibTeX

@inproceedings{hirsch2024wacv-random,
  title     = {{Random Walks for Temporal Action Segmentation with Timestamp Supervision}},
  author    = {Hirsch, Roy and Cohen, Regev and Golany, Tomer and Freedman, Daniel and Rivlin, Ehud},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {6614-6624},
  url       = {https://mlanthology.org/wacv/2024/hirsch2024wacv-random/}
}