Test-Time Alignment of Diffusion Models Without Reward Over-Optimization

Abstract

Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS.

Cite

Text

Kim et al. "Test-Time Alignment of Diffusion Models Without Reward Over-Optimization." International Conference on Learning Representations, 2025.

Markdown

[Kim et al. "Test-Time Alignment of Diffusion Models Without Reward Over-Optimization." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kim2025iclr-testtime-a/)

BibTeX

@inproceedings{kim2025iclr-testtime-a,
  title     = {{Test-Time Alignment of Diffusion Models Without Reward Over-Optimization}},
  author    = {Kim, Sunwoo and Kim, Minkyu and Park, Dongmin},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kim2025iclr-testtime-a/}
}