Hierarchical Attention Network for Image Captioning

Abstract

Recently, attention mechanism has been successfully applied in image captioning, but the existing attention methods are only established on low-level spatial features or high-level text features, which limits richness of captions. In this paper, we propose a Hierarchical Attention Network (HAN) that enables attention to be calculated on pyramidal hierarchy of features synchronously. The pyramidal hierarchy consists of features on diverse semantic levels, which allows predicting different words according to different features. On the other hand, due to the different modalities of features, a Multivariate Residual Module (MRM) is proposed to learn the joint representations from features. The MRM is able to model projections and extract relevant relations among different features. Furthermore, we introduce a context gate to balance the contribution of different features. Compared with the existing methods, our approach applies hierarchical features and exploits several multimodal integration strategies, which can significantly improve the performance. The HAN is verified on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy’s test split.

Cite

Text

Wang et al. "Hierarchical Attention Network for Image Captioning." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33018957

Markdown

[Wang et al. "Hierarchical Attention Network for Image Captioning." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/wang2019aaai-hierarchical-b/) doi:10.1609/AAAI.V33I01.33018957

BibTeX

@inproceedings{wang2019aaai-hierarchical-b,
  title     = {{Hierarchical Attention Network for Image Captioning}},
  author    = {Wang, Weixuan and Chen, Zhihong and Hu, Haifeng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {8957-8964},
  doi       = {10.1609/AAAI.V33I01.33018957},
  url       = {https://mlanthology.org/aaai/2019/wang2019aaai-hierarchical-b/}
}