DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Abstract

Diffusion models (DMs) have revolutionized generative learning. They utilize a diffusion process to encode data into a simple Gaussian distribution. However, encoding a complex, potentially multimodal data distribution into a single continuous Gaussian distribution arguably represents an unnecessarily challenging learning problem. We propose Dis*crete-Continuous Latent Variable Diffusion Models (DisCo-Diff) to simplify this task by introducing complementary discrete* latent variables. We augment DMs with learnable discrete latents, inferred with an encoder, and train DM and encoder end-to-end. DisCo-Diff does not rely on pre-trained networks, making the framework universally applicable. The discrete latents significantly simplify learning the DM’s complex noise-to-data mapping by reducing the curvature of the DM’s generative ODE. An additional autoregressive transformer models the distribution of the discrete latents, a simple step because DisCo-Diff requires only few discrete variables with small codebooks. We validate DisCo-Diff on toy data, several image synthesis tasks as well as molecular docking, and find that introducing discrete latents consistently improves model performance. For example, DisCo-Diff achieves state-of-the-art FID scores on class-conditioned ImageNet-64/128 datasets with ODE sampler.

Cite

Text

Xu et al. "DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents." International Conference on Machine Learning, 2024.

Markdown

[Xu et al. "DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/xu2024icml-discodiff/)

BibTeX

@inproceedings{xu2024icml-discodiff,
  title     = {{DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents}},
  author    = {Xu, Yilun and Corso, Gabriele and Jaakkola, Tommi and Vahdat, Arash and Kreis, Karsten},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {54933-54961},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/xu2024icml-discodiff/}
}