Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment
Abstract
While fine-tuning diffusion models with reinforcement learning (RL) has demonstrated effectiveness in directly optimizing downstream objectives, existing RL frameworks are prone to overfitting the rewards, leading to outputs that deviate from the true data distribution and exhibit reduced diversity. To address this issue, we introduce entropy as a quantitative measure to enhance the exploratory capacity of diffusion models' denoising policies. We propose an adaptive mechanism that dynamically adjusts the application and magnitude of entropy and regularization, guided by real-time quality estimation of intermediate noised states. Theoretically, we prove the convergence of our entropy-enhanced policy optimization and establish two critical properties: 1) global entropy increases through training, ensuring robust exploration capabilities, and 2) entropy systematically decreases during the denoising process, enabling a phase transition from early-stage diversity promotion to late-stage distributional fidelity. Building on this foundation, we propose a plug-and-play RL module that adaptively controls entropy and optimizes denoising steps. Extensive evaluations demonstrate the theoretical soundness and empirical robustness of our method, achieving state-of-the-art quality-diversity trade-offs across benchmarks. Notably, our framework significantly improves the rewards and reduces denoising steps in training by up to 40%.
Cite
Text
Yan et al. "Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment." International Conference on Computer Vision, 2025.Markdown
[Yan et al. "Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/yan2025iccv-entropyadaptive/)BibTeX
@inproceedings{yan2025iccv-entropyadaptive,
title = {{Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment}},
author = {Yan, RenYe and Cheng, Jikang and Gan, Yaozhong and Sun, Shikun and Wu, You and Yang, Yunfan and Ling, Liang and Lin, Jinlong and Zhu, Yeshuang and Zhou, Jie and Zhang, Jinchao and Xing, Junliang and Cai, Yimao and Huang, Ru},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {1924-1934},
url = {https://mlanthology.org/iccv/2025/yan2025iccv-entropyadaptive/}
}