Context-Infused Visual Grounding for Art

Khan, Selina; van Noord, Nanne

doi:10.1007/978-3-031-91572-7_8

Context-Infused Visual Grounding for Art

Selina Khan, Nanne van Noord

ECCVW 2024 pp. 118-136

doi:10.1007/978-3-031-91572-7_8 /eccvw/2024/khan2024eccvw-contextinfused/

Abstract

Many artwork collections contain textual attributes that provide rich and contextualised descriptions of artworks. Visual grounding offers the potential for localising subjects within these descriptions on images, however, existing approaches are trained on natural images and generalise poorly to art. In this paper, we present CIGAr (Context-Infused GroundingDINO for Art), a visual grounding approach which utilises the artwork descriptions during training as context, thereby enabling visual grounding on art. In addition, we present a new dataset, Ukiyo-eVG, with manually annotated phrase-grounding annotations, and we set a new state-of-the-art for object detection on two artwork datasets.

PDF ECCVW Semantic Scholar

Cite

Text

Khan and van Noord. "Context-Infused Visual Grounding for Art." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91572-7_8

Markdown

[Khan and van Noord. "Context-Infused Visual Grounding for Art." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/khan2024eccvw-contextinfused/) doi:10.1007/978-3-031-91572-7_8

BibTeX

@inproceedings{khan2024eccvw-contextinfused,
  title     = {{Context-Infused Visual Grounding for Art}},
  author    = {Khan, Selina and van Noord, Nanne},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {118-136},
  doi       = {10.1007/978-3-031-91572-7_8},
  url       = {https://mlanthology.org/eccvw/2024/khan2024eccvw-contextinfused/}
}