Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Abstract

This work introduces an enhanced approach to generating scene graphs by incorporating both a relationship hierarchy and commonsense knowledge. Specifically we begin by proposing a hierarchical relation head that exploits an informative hierarchical structure. It jointly predicts the relation super-category between object pairs in an image along with detailed relations under each super-category. Following this we implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system removing nonsensical predicates even with a small language-only model. Extensive experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms. The results show significant improvements with an extensive set of reasonable predictions beyond dataset annotations. Codes are available at https://github.com/bowenupenn/scene graph_commonsense.

Cite

Text

Jiang et al. "Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Jiang et al. "Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/jiang2025wacv-enhancing/)

BibTeX

@inproceedings{jiang2025wacv-enhancing,
  title     = {{Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge}},
  author    = {Jiang, Bowen and Zhuang, Zhijun and Shivakumar, Shreyas S. and Taylor, Camillo J.},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {8865-8876},
  url       = {https://mlanthology.org/wacv/2025/jiang2025wacv-enhancing/}
}