Time-Evolving Conditional Character-Centric Graphs for Movie Understanding

Abstract

Temporal graph structure learning for long-term human-centric video understanding is promising but remains challenging due to the scarcity of dense graph annotations for long videos. It is the desired capability to learn the dynamic spatio-temporal interactions of human actors and other objects implicitly from visual information itself. Toward this goal, we present a novel Time-Evolving Conditional cHaracter-centric graph (TECH) for long-term human-centric video understanding with application in Movie QA. TECH is inherently a recurrent system of the query-conditioned dynamic graph that evolves over time along the story and follows throughout the course of a movie clip. As aiming toward human-centric video understanding, TECH uses a two-stage feature refinement process to draw attention to human characters and their interactions while treating the interactions with non-human objects as contextual information. Tested on the large-scale TVQA dataset, TECH clearly shows advantages over recent state-of-the-art models.

Cite

Text

Dang et al. "Time-Evolving Conditional Character-Centric Graphs for Movie Understanding." NeurIPS 2022 Workshops: TGL, 2022.

Markdown

[Dang et al. "Time-Evolving Conditional Character-Centric Graphs for Movie Understanding." NeurIPS 2022 Workshops: TGL, 2022.](https://mlanthology.org/neuripsw/2022/dang2022neuripsw-timeevolving/)

BibTeX

@inproceedings{dang2022neuripsw-timeevolving,
  title     = {{Time-Evolving Conditional Character-Centric Graphs for Movie Understanding}},
  author    = {Dang, Long Hoang and Le, Thao Minh and Le, Vuong and Phuong, Tu Minh and Tran, Truyen},
  booktitle = {NeurIPS 2022 Workshops: TGL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/dang2022neuripsw-timeevolving/}
}