Image Cationing with Visual-Semantic LSTM

Abstract

In this paper, a novel image captioning approach is proposed to describe the content of images. Inspired by the visual processing of our cognitive system, we propose a visual-semantic LSTM model to locate the attention objects with their low-level features in the visual cell, and then successively extract high-level semantic features in the semantic cell. In addition, a state perturbation term is introduced to the word sampling strategy in the REINFORCE based method to explore proper vocabularies in the training process. Experimental results on MS COCO and Flickr30K validate the effectiveness of our approach when compared to the state-of-the-art methods.

Cite

Text

Li and Chen. "Image Cationing with Visual-Semantic LSTM." International Joint Conference on Artificial Intelligence, 2018. doi:10.24963/IJCAI.2018/110

Markdown

[Li and Chen. "Image Cationing with Visual-Semantic LSTM." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/li2018ijcai-image/) doi:10.24963/IJCAI.2018/110

BibTeX

@inproceedings{li2018ijcai-image,
  title     = {{Image Cationing with Visual-Semantic LSTM}},
  author    = {Li, Nannan and Chen, Zhenzhong},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {793-799},
  doi       = {10.24963/IJCAI.2018/110},
  url       = {https://mlanthology.org/ijcai/2018/li2018ijcai-image/}
}