Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Zheng, Qingping; Guo, Yuanfan; Deng, Jiankang; Han, Jianhua; Li, Ying; Xu, Songcen; Xu, Hang

doi:10.1609/AAAI.V38I7.28589

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Qingping Zheng, Yuanfan Guo, Jiankang Deng, Jianhua Han, Ying Li, Songcen Xu, Hang Xu

AAAI 2024 pp. 7571-7578

doi:10.1609/AAAI.V38I7.28589 /aaai/2024/zheng2024aaai-any/

Abstract

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes. This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions. Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources. Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes. To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage. This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks demonstrate that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2X compared to the traditional tiled algorithm. The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.

PDF AAAI Semantic Scholar

Cite

Text

Zheng et al. "Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I7.28589

Markdown

[Zheng et al. "Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/zheng2024aaai-any/) doi:10.1609/AAAI.V38I7.28589

BibTeX

@inproceedings{zheng2024aaai-any,
  title     = {{Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images}},
  author    = {Zheng, Qingping and Guo, Yuanfan and Deng, Jiankang and Han, Jianhua and Li, Ying and Xu, Songcen and Xu, Hang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {7571-7578},
  doi       = {10.1609/AAAI.V38I7.28589},
  url       = {https://mlanthology.org/aaai/2024/zheng2024aaai-any/}
}