GLEN: Generalized Focal Loss Ensemble of Low-Rank Networks for Calibrated Visual Question Answering

Mozaffari, Mahsa; Sapkota, Hitesh; Yu, Qi

doi:10.1609/AAAI.V39I18.34154

GLEN: Generalized Focal Loss Ensemble of Low-Rank Networks for Calibrated Visual Question Answering

Mahsa Mozaffari, Hitesh Sapkota, Qi Yu

AAAI 2025 pp. 19563-19571

doi:10.1609/AAAI.V39I18.34154 /aaai/2025/mozaffari2025aaai-glen/

Abstract

Deep learning models with large-scale backbones have been increasingly adopted to tackle complex visual question answering (VQA) problems in real settings. While providing powerful learning capacities to handle the high-dimensional and multimodal VQA data, these models tend to suffer from the memorization effect leading to overconfident predictions. This can significantly limit their applicability in critical domains (e.g., medicine, cyber-security, and public safety), where confidently wrong predictions may lead to severe consequences. In this work, we propose to perform novel low-rank network factorization, resulting in much better-calibrated networks. These low-rank factorized networks are then aggregated into an ensemble guided by a generalized focal loss to further improve the overall performance and calibration. The overall framework, referred to as the Generalized focal Loss Ensemble of low-rank Networks (GLEN), is an important step toward developing well-calibrated VQA models. We theoretically demonstrate that the generalized focal loss provides a more balanced bias-variance trade-off, which guarantees to lower the confidence of the incorrect predictions, resulting in improved calibration. Extensive experimentation conducted on benchmark datasets and comparison on various VQA models shows that GLEN leads to much better calibration over both in-distribution and out-of-distribution data without sacrificing the VQA accuracy.

PDF AAAI Semantic Scholar

Cite

Text

Mozaffari et al. "GLEN: Generalized Focal Loss Ensemble of Low-Rank Networks for Calibrated Visual Question Answering." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I18.34154

Markdown

[Mozaffari et al. "GLEN: Generalized Focal Loss Ensemble of Low-Rank Networks for Calibrated Visual Question Answering." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/mozaffari2025aaai-glen/) doi:10.1609/AAAI.V39I18.34154

BibTeX

@inproceedings{mozaffari2025aaai-glen,
  title     = {{GLEN: Generalized Focal Loss Ensemble of Low-Rank Networks for Calibrated Visual Question Answering}},
  author    = {Mozaffari, Mahsa and Sapkota, Hitesh and Yu, Qi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {19563-19571},
  doi       = {10.1609/AAAI.V39I18.34154},
  url       = {https://mlanthology.org/aaai/2025/mozaffari2025aaai-glen/}
}