How to Configure Good In-Context Sequence for Visual Question Answering
Abstract
Inspired by the success of Large Language Models in dealing with new tasks via In-Context Learning (ICL) in NLP researchers have also developed Large Vision-Language Models (LVLMs) with ICL capabilities. However when implementing ICL using these LVLMs researchers usually resort to the simplest way like random sampling to configure the in-context sequence thus leading to sub-optimal results. To enhance the ICL performance in this study we use Visual Question Answering (VQA) as case study to explore diverse in-context configurations to find the powerful ones. Additionally through observing the changes of the LVLM outputs by altering the in-context sequence we gain insights into the inner properties of LVLMs improving our understanding of them. Specifically to explore in-context configurations we design diverse retrieval methods and employ different strategies to manipulate the retrieved demonstrations. Through exhaustive experiments on three VQA datasets: VQAv2 VizWiz and OK-VQA we uncover three important inner properties of the applied LVLM and demonstrate which strategies can consistently improve the ICL VQA performance. Our code is provided in: https: //github.com/GaryJiajia/OFv2_ICL_VQA.
Cite
Text
Li et al. "How to Configure Good In-Context Sequence for Visual Question Answering." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02522Markdown
[Li et al. "How to Configure Good In-Context Sequence for Visual Question Answering." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/li2024cvpr-configure/) doi:10.1109/CVPR52733.2024.02522BibTeX
@inproceedings{li2024cvpr-configure,
title = {{How to Configure Good In-Context Sequence for Visual Question Answering}},
author = {Li, Li and Peng, Jiawei and Chen, Huiyi and Gao, Chongyang and Yang, Xu},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {26710-26720},
doi = {10.1109/CVPR52733.2024.02522},
url = {https://mlanthology.org/cvpr/2024/li2024cvpr-configure/}
}