Review Networks for Caption Generation

Abstract

We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoder- decoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-of- the-art encoder-decoder systems on the tasks of image captioning and source code captioning.

Cite

Text

Yang et al. "Review Networks for Caption Generation." Neural Information Processing Systems, 2016.

Markdown

[Yang et al. "Review Networks for Caption Generation." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/yang2016neurips-review/)

BibTeX

@inproceedings{yang2016neurips-review,
  title     = {{Review Networks for Caption Generation}},
  author    = {Yang, Zhilin and Yuan, Ye and Wu, Yuexin and Cohen, William W. and Salakhutdinov, Ruslan},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {2361-2369},
  url       = {https://mlanthology.org/neurips/2016/yang2016neurips-review/}
}