Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

Reichman, Benjamin Z.; Heck, Larry

doi:10.1109/ICCVW60793.2023.00304

Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

Benjamin Z. Reichman, Larry Heck

ICCVW 2023 pp. 2829-2834

doi:10.1109/ICCVW60793.2023.00304 /iccvw/2023/reichman2023iccvw-crossmodal/

Abstract

In many language processing tasks including most notably Large Language Modeling (LLM), retrieval augmentation improves the performance of the models by adding information during inference that may not be present in the model’s weights. This technique has been shown to be particularly useful in multimodal settings. For some tasks, like Outside Knowledge Visual Question Answering (OK-VQA), retrieval augmentation is required given the open nature of the knowledge. In many prior works for the OK-VQA task, the retriever is either a unimodal language retriever or an untrained cross-modal retriever. In this work, we present a weakly supervised training approach for cross-modal retrievers. Our method takes inspiration from the natural language modeling task of information retrieval and extends those methods to cross-modal retrieval. Since the OK-VQA task does not typically have consistent ground truth retrieval labels, we evaluate our model using lexical overlap between the ground truth and the retrieved passage. Our approach showed an average recall improvement of 28% across a large range of retrieval sizes compared to a baseline backbone network.

ICCVW Semantic Scholar

Cite

Text

Reichman and Heck. "Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00304

Markdown

[Reichman and Heck. "Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/reichman2023iccvw-crossmodal/) doi:10.1109/ICCVW60793.2023.00304

BibTeX

@inproceedings{reichman2023iccvw-crossmodal,
  title     = {{Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering}},
  author    = {Reichman, Benjamin Z. and Heck, Larry},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {2829-2834},
  doi       = {10.1109/ICCVW60793.2023.00304},
  url       = {https://mlanthology.org/iccvw/2023/reichman2023iccvw-crossmodal/}
}