Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Abstract
Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space, as they are explicitly designed to process latent images at various noise levels. Accordingly, we propose the **Latent Reward Model (LRM)**, which repurposes components of the diffusion model to predict preferences of latent images at arbitrary timesteps. Building on LRM, we introduce **Latent Preference Optimization (LPO)**, a step-level preference optimization method conducted directly in the noisy latent space. Experimental results indicate that LPO significantly improves the model's alignment with general, aesthetic, and text-image alignment preferences, while achieving a 2.5-28x training speedup over existing preference optimization methods.
Cite
Text
Zhang et al. "Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhang et al. "Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-diffusion/)BibTeX
@inproceedings{zhang2025neurips-diffusion,
title = {{Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization}},
author = {Zhang, Tao and Da, Cheng and Ding, Kun and Yang, Huan and Jin, Kun and Li, Yan and Gao, Tingting and Zhang, Di and Xiang, Shiming and Pan, Chunhong},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhang2025neurips-diffusion/}
}