UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Abstract
Text-to-image diffusion models have demonstrated remarkable capabilities in transforming text prompts into coherent images yet the computational cost of the multi-step inference remains a persistent challenge. To address this issue we present UFOGen a novel generative model designed for ultra-fast one-step text-to-image generation. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models UFOGen adopts a hybrid methodology integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation UFOGen showcases versatility in applications. Notably UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks presenting a significant advancement in the landscape of efficient generative models.
Cite
Text
Xu et al. "UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00783Markdown
[Xu et al. "UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/xu2024cvpr-ufogen/) doi:10.1109/CVPR52733.2024.00783BibTeX
@inproceedings{xu2024cvpr-ufogen,
title = {{UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs}},
author = {Xu, Yanwu and Zhao, Yang and Xiao, Zhisheng and Hou, Tingbo},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {8196-8206},
doi = {10.1109/CVPR52733.2024.00783},
url = {https://mlanthology.org/cvpr/2024/xu2024cvpr-ufogen/}
}