Discriminability Objective for Training Descriptive Captions

Abstract

One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we obtain systems that produce much more discriminative caption, according to human evaluation. Remarkably, our approach leads to improvement in other aspects of generated captions, reflected by a battery of standard scores such as BLEU, SPICE etc. Our approach is modular and can be applied to a variety of model/loss combinations commonly proposed for image captioning.

Cite

Text

Luo et al. "Discriminability Objective for Training Descriptive Captions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00728

Markdown

[Luo et al. "Discriminability Objective for Training Descriptive Captions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/luo2018cvpr-discriminability/) doi:10.1109/CVPR.2018.00728

BibTeX

@inproceedings{luo2018cvpr-discriminability,
  title     = {{Discriminability Objective for Training Descriptive Captions}},
  author    = {Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2018},
  doi       = {10.1109/CVPR.2018.00728},
  url       = {https://mlanthology.org/cvpr/2018/luo2018cvpr-discriminability/}
}