Triplet-Aware Scene Graph Embeddings

Abstract

Scene graphs have become an important form of structured knowledge for tasks such as visual relation detection, visual question answering, and image retrieval. While visualizing and interpreting word embeddings is well understood, scene graph embeddings have not been fully explored. In this work, we train scene graph embeddings in a layout generation task with varying forms of supervision, specifically introducing triplet supervision and data augmentation. We see a significant performance increase in both metrics that measure the goodness of layout prediction, mean intersection-over-union (mIoU) (52.3% vs. 49.2%) and relation score (61.7% vs. 54.1%), after the addition of triplet supervision and data augmentation. To understand how these different methods effect the scene graph representation, we apply several new visualization and evaluation methods to explore the evolution of the scene graph embedding. We find that triplet supervision significantly improves the embedding separability, which is highly correlated with performance of the layout prediction model.

Cite

Text

Schroeder et al. "Triplet-Aware Scene Graph Embeddings." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00221

Markdown

[Schroeder et al. "Triplet-Aware Scene Graph Embeddings." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/schroeder2019iccvw-tripletaware/) doi:10.1109/ICCVW.2019.00221

BibTeX

@inproceedings{schroeder2019iccvw-tripletaware,
  title     = {{Triplet-Aware Scene Graph Embeddings}},
  author    = {Schroeder, Brigit and Tripathi, Subarna and Tang, Hanlin},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {1783-1787},
  doi       = {10.1109/ICCVW.2019.00221},
  url       = {https://mlanthology.org/iccvw/2019/schroeder2019iccvw-tripletaware/}
}