ElasticDiffusion: Training-Free Arbitrary Size Image Generation Through Global-Local Content Separation
Abstract
Diffusion models have revolutionized image generation in recent years yet they are still limited to a few sizes and aspect ratios. We propose ElasticDiffusion a novel training-free decoding method that enables pretrained text-to-image diffusion models to generate images with various sizes. ElasticDiffusion attempts to decouple the generation trajectory of a pretrained model into local and global signals. The local signal controls low-level pixel information and can be estimated on local patches while the global signal is used to maintain overall structural consistency and is estimated with a reference image. We test our method on CelebA-HQ (faces) and LAION-COCO (objects/indoor/outdoor scenes). Our experiments and qualitative results show superior image coherence quality across aspect ratios compared to MultiDiffusion and the standard decoding strategy of Stable Diffusion. Project Webpage: https://elasticdiffusion.github.io
Cite
Text
Haji-Ali et al. "ElasticDiffusion: Training-Free Arbitrary Size Image Generation Through Global-Local Content Separation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00631Markdown
[Haji-Ali et al. "ElasticDiffusion: Training-Free Arbitrary Size Image Generation Through Global-Local Content Separation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/hajiali2024cvpr-elasticdiffusion/) doi:10.1109/CVPR52733.2024.00631BibTeX
@inproceedings{hajiali2024cvpr-elasticdiffusion,
title = {{ElasticDiffusion: Training-Free Arbitrary Size Image Generation Through Global-Local Content Separation}},
author = {Haji-Ali, Moayed and Balakrishnan, Guha and Ordonez, Vicente},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {6603-6612},
doi = {10.1109/CVPR52733.2024.00631},
url = {https://mlanthology.org/cvpr/2024/hajiali2024cvpr-elasticdiffusion/}
}