Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos

Ghoddoosian, Reza; Sayed, Saif; Athitsos, Vassilis

Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos

Reza Ghoddoosian, Saif Sayed, Vassilis Athitsos

WACV 2022 pp. 1922-1932

/wacv/2022/ghoddoosian2022wacv-hierarchical/

Abstract

This paper focuses on task recognition and action segmentation in weakly-labeled instructional videos, where only the ordered sequence of video-level actions is available during training. We propose a two-stream framework, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos. Further, we present a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences. Experimental results on the popular Breakfast and Cooking 2 datasets show that our two-stream hierarchical task modeling significantly outperforms existing methods in top-level task recognition for all datasets and metrics. Additionally, using our task recognition framework in the proposed top-down action segmentation approach consistently improves the state of the art, while also reducing segmentation inference time by 80-90 percent.

PDF WACV Semantic Scholar

Cite

Text

Ghoddoosian et al. "Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos." Winter Conference on Applications of Computer Vision, 2022.

Markdown

[Ghoddoosian et al. "Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos." Winter Conference on Applications of Computer Vision, 2022.](https://mlanthology.org/wacv/2022/ghoddoosian2022wacv-hierarchical/)

BibTeX

@inproceedings{ghoddoosian2022wacv-hierarchical,
  title     = {{Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos}},
  author    = {Ghoddoosian, Reza and Sayed, Saif and Athitsos, Vassilis},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2022},
  pages     = {1922-1932},
  url       = {https://mlanthology.org/wacv/2022/ghoddoosian2022wacv-hierarchical/}
}