Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
Abstract
Reinforcement learning (RL) has garnered increasing attention in text-to-image (T2I) generation. However, most existing RL approaches are tailored to either diffusion models or autoregressive models, overlooking an important alternative: masked generative models. In this work, we propose Mask-GRPO, the first method to incorporate Group Relative Policy Optimization (GRPO)-based RL into this overlooked paradigm. Our core insight is to redefine the transition probability, which is different from current approaches, and formulate the unmasking process as a multi-step decision-making problem. To further enhance our method, we explore several useful strategies, including removing the Kullback–Leibler constraint, applying the reduction strategy, and filtering out low-quality samples. Using Mask-GRPO, we improve a base model, Show-o, with substantial improvements on standard T2I benchmarks and preference alignment, outperforming existing state-of-the-art approaches.
Cite
Text
Luo et al. "Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation." Advances in Neural Information Processing Systems, 2025.Markdown
[Luo et al. "Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/luo2025neurips-reinforcement/)BibTeX
@inproceedings{luo2025neurips-reinforcement,
title = {{Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation}},
author = {Luo, Yifu and Hu, Xinhao and Fan, Keyu and Sun, Haoyuan and Chen, Zeyu and Xia, Bo and Zhang, Tiantian and Chang, Yongzhe and Wang, Xueqian},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/luo2025neurips-reinforcement/}
}