Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
Abstract
The development of multimodal large language models (MLLMs) enables the evaluation of image quality through natural language descriptions. This advancement allows for more detailed assessments. However, these MLLM-based IQA methods primarily rely on general contextual descriptions, sometimes limiting fine-grained quality assessment. To address this limitation, we introduce a new image quality assessment (IQA) task paradigm, **grounding-IQA**. This paradigm integrates multimodal referring and grounding with IQA to realize more fine-grained quality perception, thereby extending existing IQA. Specifically, grounding-IQA comprises two subtasks: grounding-IQA-description (GIQA-DES) and visual question answering (GIQA-VQA). GIQA-DES involves detailed descriptions with precise locations (e.g., bounding boxes), while GIQA-VQA focuses on quality QA for local regions. To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline. Furthermore, we develop a well-designed benchmark, GIQA-Bench. The benchmark evaluates the grounding-IQA performance from three perspectives: description quality, VQA accuracy, and grounding precision. Experiments demonstrate that our proposed method facilitates the more fine-grained IQA application. Code: https://github.com/zhengchen1999/Grounding-IQA.
Cite
Text
Chen et al. "Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment." International Conference on Learning Representations, 2026.Markdown
[Chen et al. "Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-groundingiqa/)BibTeX
@inproceedings{chen2026iclr-groundingiqa,
title = {{Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment}},
author = {Chen, Zheng and Zhang, Xun and Li, Wenbo and Pei, Renjing and Song, Fenglong and Min, Xiongkuo and Liu, Xiaohong and Yuan, Xin and Guo, Yong and Zhang, Yulun},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/chen2026iclr-groundingiqa/}
}