Baby Talk: Understanding and Generating Simple Image Descriptions

Abstract

We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision. The system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work.

Cite

Text

Kulkarni et al. "Baby Talk: Understanding and Generating Simple Image Descriptions." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011. doi:10.1109/CVPR.2011.5995466

Markdown

[Kulkarni et al. "Baby Talk: Understanding and Generating Simple Image Descriptions." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011.](https://mlanthology.org/cvpr/2011/kulkarni2011cvpr-baby/) doi:10.1109/CVPR.2011.5995466

BibTeX

@inproceedings{kulkarni2011cvpr-baby,
  title     = {{Baby Talk: Understanding and Generating Simple Image Descriptions}},
  author    = {Kulkarni, Girish and Premraj, Visruth and Dhar, Sagnik and Li, Siming and Choi, Yejin and Berg, Alexander C. and Berg, Tamara L.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2011},
  pages     = {1601-1608},
  doi       = {10.1109/CVPR.2011.5995466},
  url       = {https://mlanthology.org/cvpr/2011/kulkarni2011cvpr-baby/}
}