Reward-Guided Diffusion Model for Data-Driven Black-Box Design Optimization

Abstract

Black-box optimization (BBO) is an important method for design space exploration in high-dimensional domains, including fields such as materials science and robotic design. The diffusion models used for BBO either require a differentiable proxy or lack direct guidance. In this paper, we propose a reward-guided approach for training the Markov decision process (MDP) to increase the likelihood that the posterior generates higher-reward samples. We use the Metropolis–Hastings (MH) algorithm for Markov Chain Monte Carlo (MCMC) sampling to guide the reverse process. We first pre-train the diffusion model to match the distribution of the initial data, then fine-tune it so that the model acts as a policy that adapts its parameters to generate high-reward samples. This is a policy gradient method in which the policy is sampled from a pre-trained model to reduce the variance in training. Our experiments demonstrate that the reward-guided diffusion model achieves state-of-the-art performance across a variety of design problems, particularly in problems where the oracle is non-differentiable or an exact function.

Cite

Text

Keramati and Jaiman. "Reward-Guided Diffusion Model for Data-Driven Black-Box Design Optimization." ICLR 2025 Workshops: DeLTa, 2025.

Markdown

[Keramati and Jaiman. "Reward-Guided Diffusion Model for Data-Driven Black-Box Design Optimization." ICLR 2025 Workshops: DeLTa, 2025.](https://mlanthology.org/iclrw/2025/keramati2025iclrw-rewardguided/)

BibTeX

@inproceedings{keramati2025iclrw-rewardguided,
  title     = {{Reward-Guided Diffusion Model for Data-Driven Black-Box Design Optimization}},
  author    = {Keramati, Hadi and Jaiman, Rajeev K.},
  booktitle = {ICLR 2025 Workshops: DeLTa},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/keramati2025iclrw-rewardguided/}
}