Time-Evolving Conditional Character-Centric Graphs for Movie Understanding
Abstract
Temporal graph structure learning for long-term human-centric video understanding is promising but remains challenging due to the scarcity of dense graph annotations for long videos. It is the desired capability to learn the dynamic spatio-temporal interactions of human actors and other objects implicitly from visual information itself. Toward this goal, we present a novel Time-Evolving Conditional cHaracter-centric graph (TECH) for long-term human-centric video understanding with application in Movie QA. TECH is inherently a recurrent system of the query-conditioned dynamic graph that evolves over time along the story and follows throughout the course of a movie clip. As aiming toward human-centric video understanding, TECH uses a two-stage feature refinement process to draw attention to human characters and their interactions while treating the interactions with non-human objects as contextual information. Tested on the large-scale TVQA dataset, TECH clearly shows advantages over recent state-of-the-art models.
Cite
Text
Dang et al. "Time-Evolving Conditional Character-Centric Graphs for Movie Understanding." NeurIPS 2022 Workshops: TGL, 2022.Markdown
[Dang et al. "Time-Evolving Conditional Character-Centric Graphs for Movie Understanding." NeurIPS 2022 Workshops: TGL, 2022.](https://mlanthology.org/neuripsw/2022/dang2022neuripsw-timeevolving/)BibTeX
@inproceedings{dang2022neuripsw-timeevolving,
title = {{Time-Evolving Conditional Character-Centric Graphs for Movie Understanding}},
author = {Dang, Long Hoang and Le, Thao Minh and Le, Vuong and Phuong, Tu Minh and Tran, Truyen},
booktitle = {NeurIPS 2022 Workshops: TGL},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/dang2022neuripsw-timeevolving/}
}