Order-Free RNN with Visual Attention for Multi-Label Classification

Abstract

We propose a recurrent neural network (RNN) based model for image multi-label classification. Our model uniquely integrates and learning of visual attention and Long Short Term Memory (LSTM) layers, which jointly learns the labels of interest and their co-occurrences, while the associated image regions are visually attended. Different from existing approaches utilize either model in their network architectures, training of our model does not require pre-defined label orders. Moreover, a robust inference process is introduced so that prediction errors would not propagate and thus affect the performance. Our experiments on NUS-WISE and MS-COCO datasets confirm the design of our network and its effectiveness in solving multi-label classification problems.

Cite

Text

Chen et al. "Order-Free RNN with Visual Attention for Multi-Label Classification." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.12230

Markdown

[Chen et al. "Order-Free RNN with Visual Attention for Multi-Label Classification." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/chen2018aaai-order/) doi:10.1609/AAAI.V32I1.12230

BibTeX

@inproceedings{chen2018aaai-order,
  title     = {{Order-Free RNN with Visual Attention for Multi-Label Classification}},
  author    = {Chen, Shang-Fu and Chen, Yi-Chen and Yeh, Chih-Kuan and Wang, Yu-Chiang Frank},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {6714-6721},
  doi       = {10.1609/AAAI.V32I1.12230},
  url       = {https://mlanthology.org/aaai/2018/chen2018aaai-order/}
}