Improved Off-Policy Training of Diffusion Samplers

Abstract

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at this link as a base for future work on diffusion models for amortized inference.

Cite

Text

Sendera et al. "Improved Off-Policy Training of Diffusion Samplers." Neural Information Processing Systems, 2024. doi:10.52202/079017-2575

Markdown

[Sendera et al. "Improved Off-Policy Training of Diffusion Samplers." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/sendera2024neurips-improved/) doi:10.52202/079017-2575

BibTeX

@inproceedings{sendera2024neurips-improved,
  title     = {{Improved Off-Policy Training of Diffusion Samplers}},
  author    = {Sendera, Marcin and Kim, Minsu and Mittal, Sarthak and Lemos, Pablo and Scimeca, Luca and Rector-Brooks, Jarrid and Adam, Alexandre and Bengio, Yoshua and Malkin, Nikolay},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2575},
  url       = {https://mlanthology.org/neurips/2024/sendera2024neurips-improved/}
}