Every Picture Tells a Story: Generating Sentences from Images

Farhadi, Ali; Hejrati, Seyyed Mohammad Mohsen; Sadeghi, Mohammad Amin; Young, Peter; Rashtchian, Cyrus; Hockenmaier, Julia; Forsyth, David A.

doi:10.1007/978-3-642-15561-1_2

Every Picture Tells a Story: Generating Sentences from Images

Ali Farhadi, Seyyed Mohammad Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David A. Forsyth

ECCV 2010 pp. 15-29

doi:10.1007/978-3-642-15561-1_2 /eccv/2010/farhadi2010eccv-every/

Abstract

Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

PDF ECCV Semantic Scholar

Cite

Text

Farhadi et al. "Every Picture Tells a Story: Generating Sentences from Images." European Conference on Computer Vision, 2010. doi:10.1007/978-3-642-15561-1_2

Markdown

[Farhadi et al. "Every Picture Tells a Story: Generating Sentences from Images." European Conference on Computer Vision, 2010.](https://mlanthology.org/eccv/2010/farhadi2010eccv-every/) doi:10.1007/978-3-642-15561-1_2

BibTeX

@inproceedings{farhadi2010eccv-every,
  title     = {{Every Picture Tells a Story: Generating Sentences from Images}},
  author    = {Farhadi, Ali and Hejrati, Seyyed Mohammad Mohsen and Sadeghi, Mohammad Amin and Young, Peter and Rashtchian, Cyrus and Hockenmaier, Julia and Forsyth, David A.},
  booktitle = {European Conference on Computer Vision},
  year      = {2010},
  pages     = {15-29},
  doi       = {10.1007/978-3-642-15561-1_2},
  url       = {https://mlanthology.org/eccv/2010/farhadi2010eccv-every/}
}