Visual Semantic Role Labeling for Video Understanding

Abstract

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill various roles relevant to that event. To study the challenging task of semantic role labeling in videos or VidSRL, we introduce the VidSitu benchmark, a large scale video understanding data source with 27K 10-second movie clips richly annotated with a verb and semantic-roles every 2 seconds. Entities are co-referenced across events within a movie clip and events are connected to each other via event-event relations. Clips in VidSitu are drawn from a large collection of movies ( 3K) and have been chosen to be both complex ( 4.2 unique verbs within a video) as well as diverse ( 200 verbs have more than 100 annotations each). We provide a comprehensive analysis of the dataset in comparison to other publicly available video understanding benchmarks, several illustrative baselines and evaluate a range of standard video recognition models. Our code and dataset will be released publicly.

Cite

Text

Sadhu et al. "Visual Semantic Role Labeling for Video Understanding." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00554

Markdown

[Sadhu et al. "Visual Semantic Role Labeling for Video Understanding." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/sadhu2021cvpr-visual/) doi:10.1109/CVPR46437.2021.00554

BibTeX

@inproceedings{sadhu2021cvpr-visual,
  title     = {{Visual Semantic Role Labeling for Video Understanding}},
  author    = {Sadhu, Arka and Gupta, Tanmay and Yatskar, Mark and Nevatia, Ram and Kembhavi, Aniruddha},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {5589-5600},
  doi       = {10.1109/CVPR46437.2021.00554},
  url       = {https://mlanthology.org/cvpr/2021/sadhu2021cvpr-visual/}
}