PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Abstract

The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on $8\times$ downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from $64\times64$ to $512\times512$, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.

Cite

Text

Kim et al. "PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher." Neural Information Processing Systems, 2024. doi:10.52202/079017-0606

Markdown

[Kim et al. "PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/kim2024neurips-pagoda/) doi:10.52202/079017-0606

BibTeX

@inproceedings{kim2024neurips-pagoda,
  title     = {{PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher}},
  author    = {Kim, Dongjun and Lai, Chieh-Hsin and Liao, Wei-Hsiang and Takida, Yuhta and Murata, Naoki and Uesaka, Toshimitsu and Mitsufuji, Yuki and Ermon, Stefano},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0606},
  url       = {https://mlanthology.org/neurips/2024/kim2024neurips-pagoda/}
}