MVDiffusion: Enabling Holistic Multi-View Image Generation with Correspondence-Aware Diffusion

Abstract

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh. The project page is at https://mvdiffusion.github.io/.

Cite

Text

Tang et al. "MVDiffusion: Enabling Holistic Multi-View Image Generation with Correspondence-Aware Diffusion." Neural Information Processing Systems, 2023.

Markdown

[Tang et al. "MVDiffusion: Enabling Holistic Multi-View Image Generation with Correspondence-Aware Diffusion." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/tang2023neurips-mvdiffusion/)

BibTeX

@inproceedings{tang2023neurips-mvdiffusion,
  title     = {{MVDiffusion: Enabling Holistic Multi-View Image Generation with Correspondence-Aware Diffusion}},
  author    = {Tang, Shitao and Zhang, Fuyang and Chen, Jiacheng and Wang, Peng and Furukawa, Yasutaka},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/tang2023neurips-mvdiffusion/}
}