Transformer-Based Dual Relation Graph for Multi-Label Image Recognition

Abstract

The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.

Cite

Text

Zhao et al. "Transformer-Based Dual Relation Graph for Multi-Label Image Recognition." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00023

Markdown

[Zhao et al. "Transformer-Based Dual Relation Graph for Multi-Label Image Recognition." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/zhao2021iccv-transformerbased/) doi:10.1109/ICCV48922.2021.00023

BibTeX

@inproceedings{zhao2021iccv-transformerbased,
  title     = {{Transformer-Based Dual Relation Graph for Multi-Label Image Recognition}},
  author    = {Zhao, Jiawei and Yan, Ke and Zhao, Yifan and Guo, Xiaowei and Huang, Feiyue and Li, Jia},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {163-172},
  doi       = {10.1109/ICCV48922.2021.00023},
  url       = {https://mlanthology.org/iccv/2021/zhao2021iccv-transformerbased/}
}