TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering

Abstract

Compositional visual question answering requires reasoning over both semantic and geometry object relations. We propose a novel tiered reasoning method that dynamically selects object level candidates based on language representations and generates robust pairwise relations within the selected candidate objects. The proposed tiered relation reasoning method can be compatible with the majority of the existing visual reasoning frameworks, leading to significant performance improvement with very little extra computational cost. Moreover, we propose a policy network that decides the appropriate reasoning steps based on question complexity and current reasoning status. In experiments, our model achieves state-of-the-art performance on two VQA datasets.

Cite

Text

Yang et al. "TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58589-1_25

Markdown

[Yang et al. "TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/yang2020eccv-trrnet/) doi:10.1007/978-3-030-58589-1_25

BibTeX

@inproceedings{yang2020eccv-trrnet,
  title     = {{TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering}},
  author    = {Yang, Xiaofeng and Lin, Guosheng and Lv, Fengmao and Liu, Fayao},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58589-1_25},
  url       = {https://mlanthology.org/eccv/2020/yang2020eccv-trrnet/}
}