Magic3D: High-Resolution Text-to-3D Content Creation

Abstract

Recently, DreamFusion demonstrated the utility of a pretrained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: 1) optimization of the NeRF representation is extremely slow, 2) NeRF is supervised by images at a low resolution (64x64), thus leading to low-quality 3D models with a long wait time. In this paper, we address these limitations by utilizing a two-stage coarse-to-fine optimization framework. In the first stage, we use a sparse 3D neural representation to accelerate optimization while using a low-resolution diffusion prior. In the second stage, we use a textured mesh model initialized from the coarse neural representation, allowing us to perform optimization with a very efficient differentiable renderer interacting with high-resolution images. Our method, dubbed Magic3D, can create a 3D mesh model in 40 minutes, 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while achieving 8x higher resolution. User studies show 61.7% raters to prefer our approach than DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

Cite

Text

Lin et al. "Magic3D: High-Resolution Text-to-3D Content Creation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00037

Markdown

[Lin et al. "Magic3D: High-Resolution Text-to-3D Content Creation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/lin2023cvpr-magic3d/) doi:10.1109/CVPR52729.2023.00037

BibTeX

@inproceedings{lin2023cvpr-magic3d,
  title     = {{Magic3D: High-Resolution Text-to-3D Content Creation}},
  author    = {Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {300-309},
  doi       = {10.1109/CVPR52729.2023.00037},
  url       = {https://mlanthology.org/cvpr/2023/lin2023cvpr-magic3d/}
}