Understanding Videos, Constructing Plots Learning a Visually Grounded Storyline Model from Annotated Videos

Gupta, Abhinav; Srinivasan, Praveen; Shi, Jianbo; Davis, Larry S.

doi:10.1109/CVPR.2009.5206492

Understanding Videos, Constructing Plots Learning a Visually Grounded Storyline Model from Annotated Videos

Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, Larry S. Davis

CVPR 2009 pp. 2012-2019

doi:10.1109/CVPR.2009.5206492 /cvpr/2009/gupta2009cvpr-understanding/

Abstract

Analyzing videos of human activities involves not only recognizing actions (typically based on their appearances), but also determining the story/plot of the video. The storyline of a video describes causal relationships between actions. Beyond recognition of individual actions, discovering causal relationships helps to better understand the semantic meaning of the activities. We present an approach to learn a visually grounded storyline model of videos directly from weakly labeled data. The storyline model is represented as an AND-OR graph, a structure that can compactly encode storyline variation across videos. The edges in the AND-OR graph correspond to causal relationships which are represented in terms of spatio-temporal constraints. We formulate an Integer Programming framework for action recognition and storyline extraction using the storyline model and visual groundings learned from training data.

PDF CVPR Semantic Scholar

Cite

Text

Gupta et al. "Understanding Videos, Constructing Plots Learning a Visually Grounded Storyline Model from Annotated Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009. doi:10.1109/CVPR.2009.5206492

Markdown

[Gupta et al. "Understanding Videos, Constructing Plots Learning a Visually Grounded Storyline Model from Annotated Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009.](https://mlanthology.org/cvpr/2009/gupta2009cvpr-understanding/) doi:10.1109/CVPR.2009.5206492

BibTeX

@inproceedings{gupta2009cvpr-understanding,
  title     = {{Understanding Videos, Constructing Plots Learning a Visually Grounded Storyline Model from Annotated Videos}},
  author    = {Gupta, Abhinav and Srinivasan, Praveen and Shi, Jianbo and Davis, Larry S.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2009},
  pages     = {2012-2019},
  doi       = {10.1109/CVPR.2009.5206492},
  url       = {https://mlanthology.org/cvpr/2009/gupta2009cvpr-understanding/}
}