On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Abstract
Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method's ability to reason. Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct. The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge. Experiments and analyses are provided that show the value of the dataset. The dataset is available at www.est-vqa.org.
Cite
Text
Wang et al. "On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.01014Markdown
[Wang et al. "On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/wang2020cvpr-general/) doi:10.1109/CVPR42600.2020.01014BibTeX
@inproceedings{wang2020cvpr-general,
title = {{On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering}},
author = {Wang, Xinyu and Liu, Yuliang and Shen, Chunhua and Ng, Chun Chet and Luo, Canjie and Jin, Lianwen and Chan, Chee Seng and van den Hengel, Anton and Wang, Liangwei},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020},
doi = {10.1109/CVPR42600.2020.01014},
url = {https://mlanthology.org/cvpr/2020/wang2020cvpr-general/}
}