Learning Relationship-Aware Visual Features

Messina, Nicola; Amato, Giuseppe; Carrara, Fabio; Falchi, Fabrizio; Gennaro, Claudio

doi:10.1007/978-3-030-11018-5_40

Learning Relationship-Aware Visual Features

Nicola Messina, Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro

ECCVW 2018 pp. 486-501

doi:10.1007/978-3-030-11018-5_40 /eccvw/2018/messina2018eccvw-learning/

Abstract

Relational reasoning in Computer Vision has recently shown impressive results on visual question answering tasks. On the challenging dataset called CLEVR, the recently proposed Relation Network (RN), a simple plug-and-play module and one of the state-of-the-art approaches, has obtained a very good accuracy (95.5%) answering relational questions. In this paper, we define a sub-field of Content-Based Image Retrieval (CBIR) called Relational-CBIR (R-CBIR), in which we are interested in retrieving images with given relationships among objects. To this aim, we employ the RN architecture in order to extract relation-aware features from CLEVR images. To prove the effectiveness of these features, we extended both CLEVR and Sort-of-CLEVR datasets generating a ground-truth for R-CBIR by exploiting relational data embedded into scene-graphs. Furthermore, we propose a modification of the RN module – a two-stage Relation Network (2S-RN) – that enabled us to extract relation-aware features by using a preprocessing stage able to focus on the image content, leaving the question apart. Experiments show that our RN features, especially the 2S-RN ones, outperform the RMAC state-of-the-art features on this new challenging task.

PDF ECCVW Semantic Scholar

Cite

Text

Messina et al. "Learning Relationship-Aware Visual Features." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_40

Markdown

[Messina et al. "Learning Relationship-Aware Visual Features." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/messina2018eccvw-learning/) doi:10.1007/978-3-030-11018-5_40

BibTeX

@inproceedings{messina2018eccvw-learning,
  title     = {{Learning Relationship-Aware Visual Features}},
  author    = {Messina, Nicola and Amato, Giuseppe and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {486-501},
  doi       = {10.1007/978-3-030-11018-5_40},
  url       = {https://mlanthology.org/eccvw/2018/messina2018eccvw-learning/}
}