Quantifying the Amount of Visual Information Used by Neural Caption Generators

Abstract

Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics.

Cite

Text

Tanti et al. "Quantifying the Amount of Visual Information Used by Neural Caption Generators." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_11

Markdown

[Tanti et al. "Quantifying the Amount of Visual Information Used by Neural Caption Generators." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/tanti2018eccvw-quantifying/) doi:10.1007/978-3-030-11018-5_11

BibTeX

@inproceedings{tanti2018eccvw-quantifying,
  title     = {{Quantifying the Amount of Visual Information Used by Neural Caption Generators}},
  author    = {Tanti, Marc and Gatt, Albert and Camilleri, Kenneth P.},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {124-132},
  doi       = {10.1007/978-3-030-11018-5_11},
  url       = {https://mlanthology.org/eccvw/2018/tanti2018eccvw-quantifying/}
}