LLaVA-Critic: Learning to Evaluate Multimodal Models

Abstract

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (i) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (ii) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

Cite

Text

Xiong et al. "LLaVA-Critic: Learning to Evaluate Multimodal Models." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01271

Markdown

[Xiong et al. "LLaVA-Critic: Learning to Evaluate Multimodal Models." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/xiong2025cvpr-llavacritic/) doi:10.1109/CVPR52734.2025.01271

BibTeX

@inproceedings{xiong2025cvpr-llavacritic,
  title     = {{LLaVA-Critic: Learning to Evaluate Multimodal Models}},
  author    = {Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {13618-13628},
  doi       = {10.1109/CVPR52734.2025.01271},
  url       = {https://mlanthology.org/cvpr/2025/xiong2025cvpr-llavacritic/}
}