Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering

Mozaffari, Mahsa; Sapkota, Hitesh; Liu, Xumin; Yu, Qi

Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering

Mahsa Mozaffari, Hitesh Sapkota, Xumin Liu, Qi Yu

ICLR 2026

/iclr/2026/mozaffari2026iclr-knowledge/

Abstract

Recent advances in large language models (LLMs) have improved the accuracy of visual question answering (VQA) systems. However, directly applying LLMs to VQA still presents several challenges: (a) suboptimal performance when handling questions from specialized domains, (b) higher computational costs and slower inference speed due to large model sizes, and (c) the absence of a systematic approach to precisely quantify the uncertainty of LLM responses, raising concerns about their reliability in high-stakes tasks. To address these issues, we propose an UNcertainty-aware LLM-Integrated VQA model ($\texttt{Uni-VQA}$). This model facilitates knowledge exchange between the LLM and a calibrated task-specific model (\ie \texttt{TS-VQA}), guided by reliable confidence scores, resulting in improved VQA accuracy, reliability and inference speed. Our framework strategically leverages these confidence scores to manage the interaction between the LLM and $\texttt{TS-VQA}$: the specialized questions are answered by the $\texttt{TS-VQA}$ model, while general knowledge questions are handled by the LLM. For questions requiring both specialized and general knowledge, the $\texttt{TS-VQA}$ provides candidate answers, which the LLM then combines with its internal knowledge to generate a more accurate response. Extensive experiments on VQA datasets demonstrate the theoretically justified advantages of $\texttt{Uni-VQA}$ over using the LLM or $\texttt{TS-VQA}$ alone.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Mozaffari et al. "Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering." International Conference on Learning Representations, 2026.

Markdown

[Mozaffari et al. "Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/mozaffari2026iclr-knowledge/)

BibTeX

@inproceedings{mozaffari2026iclr-knowledge,
  title     = {{Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering}},
  author    = {Mozaffari, Mahsa and Sapkota, Hitesh and Liu, Xumin and Yu, Qi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/mozaffari2026iclr-knowledge/}
}