Diverse Image Captioning via GroupTalk

Abstract

Generally speaking, different persons tend to describe images from various aspects due to their individually subjective perception. As a result, generating the appropriate descriptions of images with both diversity and high quality is of great importance. In this paper, we propose a framework called GroupTalk to learn multiple image caption distributions simultaneously and effectively mimic the diversity of the image captions written by human beings. In particular, a novel iterative update strategy is proposed to separate training sentence samples into groups and learn their distributions at the same time. Furthermore, we introduce an efficient classifier to solve the problem brought about by the non-linear and discontinuous nature of language distributions which will impair performance. Experiments on several benchmark datasets show that GroupTalk naturally diversifies the generated captions of each image without sacrificing the accuracy. PDF

Cite

Text

Wang et al. "Diverse Image Captioning via GroupTalk." International Joint Conference on Artificial Intelligence, 2016.

Markdown

[Wang et al. "Diverse Image Captioning via GroupTalk." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/wang2016ijcai-diverse/)

BibTeX

@inproceedings{wang2016ijcai-diverse,
  title     = {{Diverse Image Captioning via GroupTalk}},
  author    = {Wang, Zhuhao and Wu, Fei and Lu, Weiming and Xiao, Jun and Li, Xi and Zhang, Zitong and Zhuang, Yueting},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {2957-2964},
  url       = {https://mlanthology.org/ijcai/2016/wang2016ijcai-diverse/}
}