Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision

Xinzhe Han, Shuhui Wang, Chi Su, Weigang Zhang, Qingming Huang, Qi Tian

ECCV 2020

doi:10.1007/978-3-030-58545-7_32 /eccv/2020/han2020eccv-interpretable/

Abstract

Visual reasoning is crucial for visual question answering (VQA). However, without labelled programs, implicit reasoning under natural supervision is still quite challenging and previous models are hard to interpret. In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer. Accordingly, we derive a Temporal Reasoning Network (TRN) framework which models the implicit reasoning process as sequential planning in latent space. Our model is interpretable on both model design in probabilist and reasoning process via visualization. We experimentally demonstrate that TRN can support implicit reasoning across various datasets. The experiment results of our model are competitive to existing implicit reasoning models and surpass baseline by large margin on complicated reasoning tasks without extra computation cost in forward stage.

PDF ECCV Semantic Scholar

Cite

Text

Han et al. "Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58545-7_32

Markdown

[Han et al. "Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/han2020eccv-interpretable/) doi:10.1007/978-3-030-58545-7_32

BibTeX

@inproceedings{han2020eccv-interpretable,
  title     = {{Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision}},
  author    = {Han, Xinzhe and Wang, Shuhui and Su, Chi and Zhang, Weigang and Huang, Qingming and Tian, Qi},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58545-7_32},
  url       = {https://mlanthology.org/eccv/2020/han2020eccv-interpretable/}
}