Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Abstract

Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.

Cite

Text

Guo et al. "Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/107

Markdown

[Guo et al. "Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/guo2020ijcai-non/) doi:10.24963/IJCAI.2020/107

BibTeX

@inproceedings{guo2020ijcai-non,
  title     = {{Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning}},
  author    = {Guo, Longteng and Liu, Jing and Zhu, Xinxin and He, Xingjian and Jiang, Jie and Lu, Hanqing},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {767-773},
  doi       = {10.24963/IJCAI.2020/107},
  url       = {https://mlanthology.org/ijcai/2020/guo2020ijcai-non/}
}