Composite Relationship Fields with Transformers for Scene Graph Generation

Abstract

Scene graph generation (SGG) methods extract relationships between objects. While most methods focus on improving top-down approaches, which build a scene graph based on detected objects from an off-the-shelf object detector, there is a limited amount of work on bottom-up approaches, which jointly detect objects and their relationships in a single stage. In this work, we present a novel bottom-up SGG approach by representing relationships using Composite Relationship Fields (CoRF). CoRF turns relationship detection into a dense regression and classification task, where each cell of the output feature map identifies surrounding objects and their relationships. Furthermore, we propose a refinement head that leverages Transformers for global scene reasoning, resulting in more meaningful relationship predictions. By combining both contributions, our method outperforms previous bottom-up methods on the Visual Genome dataset by 26% while preserving real-time performance.

Cite

Text

Adaimi et al. "Composite Relationship Fields with Transformers for Scene Graph Generation." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Adaimi et al. "Composite Relationship Fields with Transformers for Scene Graph Generation." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/adaimi2023wacv-composite/)

BibTeX

@inproceedings{adaimi2023wacv-composite,
  title     = {{Composite Relationship Fields with Transformers for Scene Graph Generation}},
  author    = {Adaimi, George and Mizrahi, David and Alahi, Alexandre},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {52-64},
  url       = {https://mlanthology.org/wacv/2023/adaimi2023wacv-composite/}
}