One-Shot Learning for Long-Tail Visual Relation Detection

Abstract

The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.

Cite

Text

Wang et al. "One-Shot Learning for Long-Tail Visual Relation Detection." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I07.6904

Markdown

[Wang et al. "One-Shot Learning for Long-Tail Visual Relation Detection." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/wang2020aaai-one/) doi:10.1609/AAAI.V34I07.6904

BibTeX

@inproceedings{wang2020aaai-one,
  title     = {{One-Shot Learning for Long-Tail Visual Relation Detection}},
  author    = {Wang, Weitao and Wang, Meng and Wang, Sen and Long, Guodong and Yao, Lina and Qi, Guilin and Chen, Yang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {12225-12232},
  doi       = {10.1609/AAAI.V34I07.6904},
  url       = {https://mlanthology.org/aaai/2020/wang2020aaai-one/}
}