RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Yu, Tianyu; Zhang, Haoye; Li, Qiming; Xu, Qixin; Yao, Yuan; Chen, Da; Lu, Xiaoman; Cui, Ganqu; Dang, Yunkai; He, Taiwen; Feng, Xiaocheng; Song, Jun; Zheng, Bo; Liu, Zhiyuan; Chua, Tat-Seng; Sun, Maosong

doi:10.1109/CVPR52734.2025.01861

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

CVPR 2025 pp. 19985-19995

doi:10.1109/CVPR52734.2025.01861 /cvpr/2025/yu2025cvpr-rlaifv/

Abstract

Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs.In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on seven benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7% and overall hallucination by 33.7%. Remarkably, RLAIF-V 12B further reveals the self-alignment potential of open-source MLLMs, where the model can learn from feedback of itself to achieve super GPT-4V trustworthiness.

PDF CVPR Semantic Scholar

Cite

Text

Yu et al. "RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01861

Markdown

[Yu et al. "RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/yu2025cvpr-rlaifv/) doi:10.1109/CVPR52734.2025.01861

BibTeX

@inproceedings{yu2025cvpr-rlaifv,
  title     = {{RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness}},
  author    = {Yu, Tianyu and Zhang, Haoye and Li, Qiming and Xu, Qixin and Yao, Yuan and Chen, Da and Lu, Xiaoman and Cui, Ganqu and Dang, Yunkai and He, Taiwen and Feng, Xiaocheng and Song, Jun and Zheng, Bo and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {19985-19995},
  doi       = {10.1109/CVPR52734.2025.01861},
  url       = {https://mlanthology.org/cvpr/2025/yu2025cvpr-rlaifv/}
}