Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Zhang, Tao; Da, Cheng; Ding, Kun; Yang, Huan; Jin, Kun; Li, Yan; Gao, Tingting; Zhang, Di; Xiang, Shiming; Pan, Chunhong

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang, Cheng Da, Kun Ding, Huan Yang, Kun Jin, Yan Li, Tingting Gao, Di Zhang, Shiming Xiang, Chunhong Pan

NeurIPS 2025

/neurips/2025/zhang2025neurips-diffusion/

Abstract

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space, as they are explicitly designed to process latent images at various noise levels. Accordingly, we propose the **Latent Reward Model (LRM)**, which repurposes components of the diffusion model to predict preferences of latent images at arbitrary timesteps. Building on LRM, we introduce **Latent Preference Optimization (LPO)**, a step-level preference optimization method conducted directly in the noisy latent space. Experimental results indicate that LPO significantly improves the model's alignment with general, aesthetic, and text-image alignment preferences, while achieving a 2.5-28x training speedup over existing preference optimization methods.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-diffusion/)

BibTeX

@inproceedings{zhang2025neurips-diffusion,
  title     = {{Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization}},
  author    = {Zhang, Tao and Da, Cheng and Ding, Kun and Yang, Huan and Jin, Kun and Li, Yan and Gao, Tingting and Zhang, Di and Xiang, Shiming and Pan, Chunhong},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-diffusion/}
}