EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Luo, Xin; Wang, Jiahao; Wu, Chenyuan; Xiao, Shitao; Jiang, Xiyan; Lian, Defu; Zhang, Jiajun; Liu, Dong; Liu, Zheng

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Xin Luo, Jiahao Wang, Chenyuan Wu, Shitao Xiao, Xiyan Jiang, Defu Lian, Jiajun Zhang, Dong Liu, Zheng Liu

ICLR 2026

/iclr/2026/luo2026iclr-editscore/

Abstract

Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce $\textbf{EditReward-Bench}$, a comprehensive benchmark to systematically evaluate reward models on editing quality. Guided by this benchmark, we develop $\textbf{EditScore}$, an efficient model to evaluate the quality of instruction-guided editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain. Our code, models, and benchmark will be released publicly.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Luo et al. "EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling." International Conference on Learning Representations, 2026.

Markdown

[Luo et al. "EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/luo2026iclr-editscore/)

BibTeX

@inproceedings{luo2026iclr-editscore,
  title     = {{EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling}},
  author    = {Luo, Xin and Wang, Jiahao and Wu, Chenyuan and Xiao, Shitao and Jiang, Xiyan and Lian, Defu and Zhang, Jiajun and Liu, Dong and Liu, Zheng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/luo2026iclr-editscore/}
}