Motion-Appearance Co-Memory Networks for Video Question Answering

Gao, Jiyang; Ge, Runzhou; Chen, Kan; Nevatia, Ram

doi:10.1109/CVPR.2018.00688

Motion-Appearance Co-Memory Networks for Video Question Answering

Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia

CVPR 2018

doi:10.1109/CVPR.2018.00688 /cvpr/2018/gao2018cvpr-motionappearance/

Abstract

Video Question Answering (QA) is an important task in understanding video temporal structure. We observe that there are three unique attributes of video QA compared with image QA: (1) it deals with long sequences of images containing richer information not only in quantity but also in variety; (2) motion and appearance information are usually correlated with each other and able to provide useful attention cues to the other; (3) different questions require different number of frames to infer the answer. Based these observations, we propose a motion-appearance co-memory network for video QA. Our networks are built on concepts from Dynamic Memory Network (DMN) and introduces new mechanisms for video QA. Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions. We evaluate our method on TGIF-QA dataset, and the results outperform state-of-the-art significantly on all four tasks of TGIF-QA.

PDF CVPR Semantic Scholar

Cite

Text

Gao et al. "Motion-Appearance Co-Memory Networks for Video Question Answering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00688

Markdown

[Gao et al. "Motion-Appearance Co-Memory Networks for Video Question Answering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/gao2018cvpr-motionappearance/) doi:10.1109/CVPR.2018.00688

BibTeX

@inproceedings{gao2018cvpr-motionappearance,
  title     = {{Motion-Appearance Co-Memory Networks for Video Question Answering}},
  author    = {Gao, Jiyang and Ge, Runzhou and Chen, Kan and Nevatia, Ram},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2018},
  doi       = {10.1109/CVPR.2018.00688},
  url       = {https://mlanthology.org/cvpr/2018/gao2018cvpr-motionappearance/}
}