Relation Also Need Attention: Integrating Relation Information into Image Captioning

Abstract

Image captioning methods with attention mechanism are leading this field, especially models with global and local attention. But there are few conventional models to integrate the relationship information between various regions of the image. In this paper, this kind of relationship features are embedded into the fused attention mechanism to explore the internal visual and semantic relations between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we combine Generative Adversarial Network with Reinforcement Learning and employ the greedy decoding method to generate a dynamic baseline reward for self-critical training. Finally, experiments on MSCOCO datasets show that the model can generate more accurate and vivid image captioning sentences and perform better in multiple prevailing metrics than the previous advanced models.

Cite

Text

Chen et al. "Relation Also Need Attention: Integrating Relation Information into Image Captioning." Proceedings of The 13th Asian Conference on Machine Learning, 2021.

Markdown

[Chen et al. "Relation Also Need Attention: Integrating Relation Information into Image Captioning." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/chen2021acml-relation/)

BibTeX

@inproceedings{chen2021acml-relation,
  title     = {{Relation Also Need Attention: Integrating Relation Information into Image Captioning}},
  author    = {Chen, Tianyu and Li, Zhixin and Xian, Tiantao and Zhang, Canlong and Ma, Huifang},
  booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
  year      = {2021},
  pages     = {1537-1552},
  volume    = {157},
  url       = {https://mlanthology.org/acml/2021/chen2021acml-relation/}
}