DISCO: Describing Images Using Scene Contexts and Objects

Abstract

In this paper, we propose a bottom-up approach to generating short descriptive sentences from images, to enhance scene understanding. We demonstrate automatic methods for mapping the visual content in an image to natural spoken or written language. We also introduce a human-in-the-loop evaluation strategy that quantitatively captures the meaningfulness of the generated sentences. We recorded a correctness rate of 60.34% when human users were asked to judge the meaningfulness of the sentences generated from relatively challenging images. Also, our automatic methods compared well with the state-of-the-art techniques for the related computer vision tasks.

Cite

Text

Nwogu et al. "DISCO: Describing Images Using Scene Contexts and Objects." AAAI Conference on Artificial Intelligence, 2011. doi:10.1609/AAAI.V25I1.7978

Markdown

[Nwogu et al. "DISCO: Describing Images Using Scene Contexts and Objects." AAAI Conference on Artificial Intelligence, 2011.](https://mlanthology.org/aaai/2011/nwogu2011aaai-disco/) doi:10.1609/AAAI.V25I1.7978

BibTeX

@inproceedings{nwogu2011aaai-disco,
  title     = {{DISCO: Describing Images Using Scene Contexts and Objects}},
  author    = {Nwogu, Ifeoma and Zhou, Yingbo and Brown, Christopher},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2011},
  pages     = {1487-1493},
  doi       = {10.1609/AAAI.V25I1.7978},
  url       = {https://mlanthology.org/aaai/2011/nwogu2011aaai-disco/}
}