MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Abstract

Existing efforts to align multimodal large language models (MLLMs) with human preferences have only achieved progress in narrow areas, such as hallucination reduction, but remain limited in practical applicability and generalizability. To this end, we introduce MM-RLHF, a dataset containing 120k fine-grained, human-annotated preference comparison pairs. This dataset represents a substantial advancement over existing resources, offering superior size, diversity, annotation granularity, and quality. Leveraging this dataset, we propose several key innovations to improve both the quality of reward models and the efficiency of alignment algorithms. Notably, we introduce the Critique-Based Reward Model, which generates critiques of model outputs before assigning scores, offering enhanced interpretability and more informative feedback compared to traditional scalar reward mechanisms. Additionally, we propose Dynamic Reward Scaling, a method that adjusts the loss weight of each sample according to the reward signal, thereby optimizing the use of high-quality comparison pairs. Our approach is rigorously evaluated across 10 distinct dimensions, encompassing 27 benchmarks, with results demonstrating significant and consistent improvements in model performance (Figure.1).

Cite

Text

Zhang et al. "MM-RLHF: The Next Step Forward in Multimodal LLM Alignment." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "MM-RLHF: The Next Step Forward in Multimodal LLM Alignment." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-mmrlhf/)

BibTeX

@inproceedings{zhang2025icml-mmrlhf,
  title     = {{MM-RLHF: The Next Step Forward in Multimodal LLM Alignment}},
  author    = {Zhang, Yifan and Yu, Tao and Tian, Haochen and Fu, Chaoyou and Li, Peiyan and Zeng, Jianshu and Xie, Wulin and Shi, Yang and Zhang, Huanyu and Wu, Junkang and Wang, Xue and Hu, Yibo and Wen, Bin and Gao, Tingting and Zhang, Zhang and Yang, Fan and Zhang, Di and Wang, Liang and Jin, Rong},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {76625-76654},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-mmrlhf/}
}