How to Configure Good In-Context Sequence for Visual Question Answering

Li Li, Jiawei Peng, Huiyi Chen, Chongyang Gao, Xu Yang

CVPR 2024 pp. 26710-26720

doi:10.1109/CVPR52733.2024.02522 /cvpr/2024/li2024cvpr-configure/

Abstract

Inspired by the success of Large Language Models in dealing with new tasks via In-Context Learning (ICL) in NLP researchers have also developed Large Vision-Language Models (LVLMs) with ICL capabilities. However when implementing ICL using these LVLMs researchers usually resort to the simplest way like random sampling to configure the in-context sequence thus leading to sub-optimal results. To enhance the ICL performance in this study we use Visual Question Answering (VQA) as case study to explore diverse in-context configurations to find the powerful ones. Additionally through observing the changes of the LVLM outputs by altering the in-context sequence we gain insights into the inner properties of LVLMs improving our understanding of them. Specifically to explore in-context configurations we design diverse retrieval methods and employ different strategies to manipulate the retrieved demonstrations. Through exhaustive experiments on three VQA datasets: VQAv2 VizWiz and OK-VQA we uncover three important inner properties of the applied LVLM and demonstrate which strategies can consistently improve the ICL VQA performance. Our code is provided in: https: //github.com/GaryJiajia/OFv2_ICL_VQA.

PDF CVPR Semantic Scholar

Cite

Text

Li et al. "How to Configure Good In-Context Sequence for Visual Question Answering." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02522

Markdown

[Li et al. "How to Configure Good In-Context Sequence for Visual Question Answering." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/li2024cvpr-configure/) doi:10.1109/CVPR52733.2024.02522

BibTeX

@inproceedings{li2024cvpr-configure,
  title     = {{How to Configure Good In-Context Sequence for Visual Question Answering}},
  author    = {Li, Li and Peng, Jiawei and Chen, Huiyi and Gao, Chongyang and Yang, Xu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {26710-26720},
  doi       = {10.1109/CVPR52733.2024.02522},
  url       = {https://mlanthology.org/cvpr/2024/li2024cvpr-configure/}
}