A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Abstract

Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.

Cite

Text

Huang et al. "A Variational Perspective on Diffusion-Based Generative Models and Score Matching." ICML 2021 Workshops: INNF, 2021.

Markdown

[Huang et al. "A Variational Perspective on Diffusion-Based Generative Models and Score Matching." ICML 2021 Workshops: INNF, 2021.](https://mlanthology.org/icmlw/2021/huang2021icmlw-variational/)

BibTeX

@inproceedings{huang2021icmlw-variational,
  title     = {{A Variational Perspective on Diffusion-Based Generative Models and Score Matching}},
  author    = {Huang, Chin-Wei and Lim, Jae Hyun and Courville, Aaron},
  booktitle = {ICML 2021 Workshops: INNF},
  year      = {2021},
  url       = {https://mlanthology.org/icmlw/2021/huang2021icmlw-variational/}
}