Spatial Knowledge Distillation to Aid Visual Reasoning

Somak Aditya, Rudra Saha, Yezhou Yang, Chitta Baral

WACV 2019 pp. 227-235

doi:10.1109/WACV.2019.00030 /wacv/2019/aditya2019wacv-spatial/

Abstract

For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system's capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.

PDF WACV Semantic Scholar

Cite

Text

Aditya et al. "Spatial Knowledge Distillation to Aid Visual Reasoning." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019. doi:10.1109/WACV.2019.00030

Markdown

[Aditya et al. "Spatial Knowledge Distillation to Aid Visual Reasoning." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019.](https://mlanthology.org/wacv/2019/aditya2019wacv-spatial/) doi:10.1109/WACV.2019.00030

BibTeX

@inproceedings{aditya2019wacv-spatial,
  title     = {{Spatial Knowledge Distillation to Aid Visual Reasoning}},
  author    = {Aditya, Somak and Saha, Rudra and Yang, Yezhou and Baral, Chitta},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2019},
  pages     = {227-235},
  doi       = {10.1109/WACV.2019.00030},
  url       = {https://mlanthology.org/wacv/2019/aditya2019wacv-spatial/}
}