CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Abstract

Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.

Cite

Text

Gao et al. "CAT3D: Create Anything in 3D with Multi-View Diffusion Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2403

Markdown

[Gao et al. "CAT3D: Create Anything in 3D with Multi-View Diffusion Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/gao2024neurips-cat3d/) doi:10.52202/079017-2403

BibTeX

@inproceedings{gao2024neurips-cat3d,
  title     = {{CAT3D: Create Anything in 3D with Multi-View Diffusion Models}},
  author    = {Gao, Ruiqi and Hołyński, Aleksander and Henzler, Philipp and Brussee, Arthur and Martin-Brualla, Ricardo and Srinivasan, Pratul and Barron, Jonathan T. and Poole, Ben},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2403},
  url       = {https://mlanthology.org/neurips/2024/gao2024neurips-cat3d/}
}