Visual Semantic Relatedness Dataset for Image Captioning

Abstract

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset COCO Captions [30] has been extended with information about the scene (such as objects in the image). Since this information has a textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach. 1

Cite

Text

Sabir et al. "Visual Semantic Relatedness Dataset for Image Captioning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00592

Markdown

[Sabir et al. "Visual Semantic Relatedness Dataset for Image Captioning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/sabir2023cvprw-visual/) doi:10.1109/CVPRW59228.2023.00592

BibTeX

@inproceedings{sabir2023cvprw-visual,
  title     = {{Visual Semantic Relatedness Dataset for Image Captioning}},
  author    = {Sabir, Ahmed and Moreno-Noguer, Francesc and Padró, Lluís},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {5598-5606},
  doi       = {10.1109/CVPRW59228.2023.00592},
  url       = {https://mlanthology.org/cvprw/2023/sabir2023cvprw-visual/}
}