Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering

Yao Jin, Guocheng Niu, Xinyan Xiao, Jian Zhang, Xi Peng, Jun Yu

AAAI 2023 pp. 8141-8149

doi:10.1609/AAAI.V37I7.25983 /aaai/2023/jin2023aaai-knowledge/

Abstract

Open-ended Video question answering (open-ended VideoQA) aims to understand video content and question semantics to generate the correct answers. Most of the best performing models define the problem as a discriminative task of multi-label classification. In real-world scenarios, however, it is difficult to define a candidate set that includes all possible answers. In this paper, we propose a Knowledge-constrained Generative VideoQA Algorithm (KcGA) with an encoder-decoder pipeline, which enables out-of-domain answer generation through an adaptive external knowledge module and a multi-stream information control mechanism. We use ClipBERT to extract the video-question features, extract framewise object-level external knowledge from a commonsense knowledge base and compute the contextual-aware episode memory units via an attention based GRU to form the external knowledge features, and exploit multi-stream information control mechanism to fuse video-question and external knowledge features such that the semantic complementation and alignment are well achieved. We evaluate our model on two open-ended benchmark datasets to demonstrate that we can effectively and robustly generate high-quality answers without restrictions of training data.

PDF AAAI Semantic Scholar

Cite

Text

Jin et al. "Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I7.25983

Markdown

[Jin et al. "Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/jin2023aaai-knowledge/) doi:10.1609/AAAI.V37I7.25983

BibTeX

@inproceedings{jin2023aaai-knowledge,
  title     = {{Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering}},
  author    = {Jin, Yao and Niu, Guocheng and Xiao, Xinyan and Zhang, Jian and Peng, Xi and Yu, Jun},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {8141-8149},
  doi       = {10.1609/AAAI.V37I7.25983},
  url       = {https://mlanthology.org/aaai/2023/jin2023aaai-knowledge/}
}