Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision
Abstract
Visual reasoning is crucial for visual question answering (VQA). However, without labelled programs, implicit reasoning under natural supervision is still quite challenging and previous models are hard to interpret. In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer. Accordingly, we derive a Temporal Reasoning Network (TRN) framework which models the implicit reasoning process as sequential planning in latent space. Our model is interpretable on both model design in probabilist and reasoning process via visualization. We experimentally demonstrate that TRN can support implicit reasoning across various datasets. The experiment results of our model are competitive to existing implicit reasoning models and surpass baseline by large margin on complicated reasoning tasks without extra computation cost in forward stage.
Cite
Text
Han et al. "Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58545-7_32Markdown
[Han et al. "Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/han2020eccv-interpretable/) doi:10.1007/978-3-030-58545-7_32BibTeX
@inproceedings{han2020eccv-interpretable,
title = {{Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision}},
author = {Han, Xinzhe and Wang, Shuhui and Su, Chi and Zhang, Weigang and Huang, Qingming and Tian, Qi},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58545-7_32},
url = {https://mlanthology.org/eccv/2020/han2020eccv-interpretable/}
}