Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Abstract
Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state---such as the steps of a recipe or the steps of a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional video, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.
Cite
Text
Ashutosh et al. "Video-Mined Task Graphs for Keystep Recognition in Instructional Videos." Neural Information Processing Systems, 2023.Markdown
[Ashutosh et al. "Video-Mined Task Graphs for Keystep Recognition in Instructional Videos." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/ashutosh2023neurips-videomined/)BibTeX
@inproceedings{ashutosh2023neurips-videomined,
title = {{Video-Mined Task Graphs for Keystep Recognition in Instructional Videos}},
author = {Ashutosh, Kumar and Ramakrishnan, Santhosh Kumar and Afouras, Triantafyllos and Grauman, Kristen},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/ashutosh2023neurips-videomined/}
}