Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Abstract

We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components.The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignment vs. high resolution rendering. We first demonstrate the benefits of scaling a Shallow UNet, with no down(up)-sampling enc(dec)oder. Scaling its deep core layers is shown to improve alignment, object structure, and composition. Building on this core model, we propose a greedy algorithm that grows the architecture into high resolution end-to-end models, while preserving the integrity of the pre-trained representation,stabilizing training, and reducing the need for large high-resolution datasets. This enables a single stage model capable of generating high-resolution images without the need of a super-resolution cascade. Our key results rely on public datasets and show that we are able to train non-cascaded models up to 8B parameters with no further regularization schemes.Vermeer, our full pipeline model trained with internal datasets to produce 1024×1024 images, without cascades, is preferred by 44.0% vs. 21.4% human evaluators over SDXL.

Cite

Text

Vasconcelos et al. "Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models." Transactions on Machine Learning Research, 2024.

Markdown

[Vasconcelos et al. "Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/vasconcelos2024tmlr-greedy/)

BibTeX

@article{vasconcelos2024tmlr-greedy,
  title     = {{Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models}},
  author    = {Vasconcelos, Cristina Nader and Rashwan, Abdullah and Waters, Austin and Walker, Trevor and Xu, Keyang and Yan, Jimmy and Qian, Rui and Li, Yeqing and Luo, Shixin and Onoe, Yasumasa and Parekh, Zarana and Kajic, Ivana and Guo, Mandy and Zhou, Wenlei and Rosston, Sarah and Garg, Roopal and Fei, Hongliang and Pont-Tuset, Jordi and Wang, Su and Nandwani, Henna and Bunner, Andrew and Swersky, Kevin and Fleet, David J. and Wang, Oliver and Baldridge, Jason Michael},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/vasconcelos2024tmlr-greedy/}
}