CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
Abstract
We introduce a novel method for generating 360° panoramas from text prompts or images. Our approach leverages recent advances in 3D generation by employing multi-view diffusion models to jointly synthesize the six faces of a cubemap. Unlike previous methods that rely on processing equirectangular projections or autoregressive generation, our method treats each face as a standard perspective image, simplifying the generation process and enabling the use of existing multi-view diffusion models. We demonstrate that these models can be adapted to produce high-quality cubemaps without requiring correspondence-aware attention layers. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set, whilst achieving state-of-the-art results, both qualitatively and quantitatively.
Cite
Text
Kalischek et al. "CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation." International Conference on Learning Representations, 2025.Markdown
[Kalischek et al. "CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kalischek2025iclr-cubediff/)BibTeX
@inproceedings{kalischek2025iclr-cubediff,
title = {{CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation}},
author = {Kalischek, Nikolai and Oechsle, Michael and Manhardt, Fabian and Henzler, Philipp and Schindler, Konrad and Tombari, Federico},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/kalischek2025iclr-cubediff/}
}