BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

Abstract

Effectively aligning with human judgment when evaluating machine-generated image captions represents a complex yet intriguing challenge. Existing evaluation metrics like CIDEr or CLIP-Score fall short in this regard as they do not take into account the corresponding image or lack the capability of encoding fine-grained details and penalizing hallucinations. To overcome these issues, in this paper, we propose , a new learnable and reference-free image captioning metric that employs a novel module to map visual features into dense vectors and integrates them into multi-modal pseudo-captions which are built during the evaluation process. This approach results in a multimodal metric that properly incorporates information from the input image without relying on reference captions, bridging the gap between human judgment and machine-generated image captions. Experiments spanning several datasets demonstrate that our proposal achieves state-of-the-art results compared to existing reference-free evaluation scores. Our source code and trained models are publicly available at: https://github.com/aimagelab/bridge-score

Cite

Text

Sarto et al. "BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73229-4_5

Markdown

[Sarto et al. "BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/sarto2024eccv-bridge/) doi:10.1007/978-3-031-73229-4_5

BibTeX

@inproceedings{sarto2024eccv-bridge,
  title     = {{BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues}},
  author    = {Sarto, Sara and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73229-4_5},
  url       = {https://mlanthology.org/eccv/2024/sarto2024eccv-bridge/}
}