A Simple Baseline for Weakly-Supervised Scene Graph Generation

Abstract

We investigate the weakly-supervised scene graph generation, which is a challenging task since no correspondence of label and object is provided. The previous work regards such correspondence as a latent variable which is iteratively updated via nested optimization of the scene graph generation objective. However, we further reduce the complexity by decoupling it into an efficient first-order graph matching module optimized via contrastive learning to obtain such correspondence, which is used to train a standard scene graph generation model. The extensive experiments show that such a simple pipeline can significantly surpass the previous state-of-the-art by more than 30% on the Visual Genome dataset, both in terms of graph matching accuracy and scene graph quality. We believe this work serves as a strong baseline for future research.

Cite

Text

Shi et al. "A Simple Baseline for Weakly-Supervised Scene Graph Generation." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01608

Markdown

[Shi et al. "A Simple Baseline for Weakly-Supervised Scene Graph Generation." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/shi2021iccv-simple/) doi:10.1109/ICCV48922.2021.01608

BibTeX

@inproceedings{shi2021iccv-simple,
  title     = {{A Simple Baseline for Weakly-Supervised Scene Graph Generation}},
  author    = {Shi, Jing and Zhong, Yiwu and Xu, Ning and Li, Yin and Xu, Chenliang},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {16393-16402},
  doi       = {10.1109/ICCV48922.2021.01608},
  url       = {https://mlanthology.org/iccv/2021/shi2021iccv-simple/}
}