Context-Aware Captions from Context-Agnostic Supervision
Abstract
We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of "siamese cat" and "tiger cat", we generate language that describes the "siamese cat" in a way that distinguishes it from "tiger cat". Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB- 200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination.
Cite
Text
Vedantam et al. "Context-Aware Captions from Context-Agnostic Supervision." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.120Markdown
[Vedantam et al. "Context-Aware Captions from Context-Agnostic Supervision." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/vedantam2017cvpr-contextaware/) doi:10.1109/CVPR.2017.120BibTeX
@inproceedings{vedantam2017cvpr-contextaware,
title = {{Context-Aware Captions from Context-Agnostic Supervision}},
author = {Vedantam, Ramakrishna and Bengio, Samy and Murphy, Kevin and Parikh, Devi and Chechik, Gal},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.120},
url = {https://mlanthology.org/cvpr/2017/vedantam2017cvpr-contextaware/}
}