A Variational Perspective on Diffusion-Based Generative Models and Score Matching
Abstract
Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
Cite
Text
Huang et al. "A Variational Perspective on Diffusion-Based Generative Models and Score Matching." ICML 2021 Workshops: INNF, 2021.Markdown
[Huang et al. "A Variational Perspective on Diffusion-Based Generative Models and Score Matching." ICML 2021 Workshops: INNF, 2021.](https://mlanthology.org/icmlw/2021/huang2021icmlw-variational/)BibTeX
@inproceedings{huang2021icmlw-variational,
title = {{A Variational Perspective on Diffusion-Based Generative Models and Score Matching}},
author = {Huang, Chin-Wei and Lim, Jae Hyun and Courville, Aaron},
booktitle = {ICML 2021 Workshops: INNF},
year = {2021},
url = {https://mlanthology.org/icmlw/2021/huang2021icmlw-variational/}
}