Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering

Abstract

Recent advances in large language models (LLMs) have improved the accuracy of visual question answering (VQA) systems. However, directly applying LLMs to VQA still presents several challenges: (a) suboptimal performance when handling questions from specialized domains, (b) higher computational costs and slower inference speed due to large model sizes, and (c) the absence of a systematic approach to precisely quantify the uncertainty of LLM responses, raising concerns about their reliability in high-stakes tasks. To address these issues, we propose an UNcertainty-aware LLM-Integrated VQA model ($\texttt{Uni-VQA}$). This model facilitates knowledge exchange between the LLM and a calibrated task-specific model (\ie \texttt{TS-VQA}), guided by reliable confidence scores, resulting in improved VQA accuracy, reliability and inference speed. Our framework strategically leverages these confidence scores to manage the interaction between the LLM and $\texttt{TS-VQA}$: the specialized questions are answered by the $\texttt{TS-VQA}$ model, while general knowledge questions are handled by the LLM. For questions requiring both specialized and general knowledge, the $\texttt{TS-VQA}$ provides candidate answers, which the LLM then combines with its internal knowledge to generate a more accurate response. Extensive experiments on VQA datasets demonstrate the theoretically justified advantages of $\texttt{Uni-VQA}$ over using the LLM or $\texttt{TS-VQA}$ alone.

Cite

Text

Mozaffari et al. "Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering." International Conference on Learning Representations, 2026.

Markdown

[Mozaffari et al. "Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/mozaffari2026iclr-knowledge/)

BibTeX

@inproceedings{mozaffari2026iclr-knowledge,
  title     = {{Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering}},
  author    = {Mozaffari, Mahsa and Sapkota, Hitesh and Liu, Xumin and Yu, Qi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/mozaffari2026iclr-knowledge/}
}