DMV3D: Denoising Multi-View Diffusion Using 3D Large Reconstruction Model

Abstract

We propose DMV3D, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and, functioning as a denoiser, can denoise noisy multi-view images via 3D NeRF reconstruction and rendering, achieving single-stage 3D generation in the 2D diffusion denoising process. We train DMV3D on large-scale multi-view image datasets of extremely diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://dmv3d.github.io/.

Cite

Text

Xu et al. "DMV3D: Denoising Multi-View Diffusion Using 3D Large Reconstruction Model." International Conference on Learning Representations, 2024.

Markdown

[Xu et al. "DMV3D: Denoising Multi-View Diffusion Using 3D Large Reconstruction Model." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/xu2024iclr-dmv3d/)

BibTeX

@inproceedings{xu2024iclr-dmv3d,
  title     = {{DMV3D: Denoising Multi-View Diffusion Using 3D Large Reconstruction Model}},
  author    = {Xu, Yinghao and Tan, Hao and Luan, Fujun and Bi, Sai and Wang, Peng and Li, Jiahao and Shi, Zifan and Sunkavalli, Kalyan and Wetzstein, Gordon and Xu, Zexiang and Zhang, Kai},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/xu2024iclr-dmv3d/}
}