Taming Stable Diffusion for Text to 360 Panorama Image Generation

Abstract

Generative models e.g. Stable Diffusion have enabled the creation of photorealistic images from text prompts. Yet the generation of 360-degree panorama images from text remains a challenge particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and thanks to its dual-branch structure can integrate additional constraints like room layout for customized panorama outputs.

Cite

Text

Zhang et al. "Taming Stable Diffusion for Text to 360 Panorama Image Generation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00607

Markdown

[Zhang et al. "Taming Stable Diffusion for Text to 360 Panorama Image Generation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-taming/) doi:10.1109/CVPR52733.2024.00607

BibTeX

@inproceedings{zhang2024cvpr-taming,
  title     = {{Taming Stable Diffusion for Text to 360 Panorama Image Generation}},
  author    = {Zhang, Cheng and Wu, Qianyi and Gambardella, Camilo Cruz and Huang, Xiaoshui and Phung, Dinh and Ouyang, Wanli and Cai, Jianfei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {6347-6357},
  doi       = {10.1109/CVPR52733.2024.00607},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-taming/}
}