Exploring Phrase Grounding Without Training: Contextualisation and Extension to Text-Based Image Retrieval
Abstract
Grounding phrases in images links the visual and the textual modalities and is useful for many image understanding and multimodal tasks. All known models heavily rely on annotated data and complex trainable systems to perform phrase grounding – except for a recent work [38] that proposes a system requiring no training nor aligned data, yet is able to compete with (weakly) supervised systems on popular phrase grounding datasets. We explore and expand the upper bound of such a system, by contextualising both the image and language representation with structured representations. We show that our extensions benefit the model and establish a harder, but fairer baseline for (weakly) supervised models. We also perform a stress test to assess the further applicability of such a system for creating a sentence retrieval system requiring no training nor annotated data. We show that such models have a difficult start and a long way to go and that more research is needed.
Cite
Text
Parcalabescu and Frank. "Exploring Phrase Grounding Without Training: Contextualisation and Extension to Text-Based Image Retrieval." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00489Markdown
[Parcalabescu and Frank. "Exploring Phrase Grounding Without Training: Contextualisation and Extension to Text-Based Image Retrieval." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/parcalabescu2020cvprw-exploring/) doi:10.1109/CVPRW50498.2020.00489BibTeX
@inproceedings{parcalabescu2020cvprw-exploring,
title = {{Exploring Phrase Grounding Without Training: Contextualisation and Extension to Text-Based Image Retrieval}},
author = {Parcalabescu, Letitia and Frank, Anette},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2020},
pages = {4137-4146},
doi = {10.1109/CVPRW50498.2020.00489},
url = {https://mlanthology.org/cvprw/2020/parcalabescu2020cvprw-exploring/}
}