Direct2.5: Diverse Text-to-3D Generation via Multi-View 2.5d Diffusion
Abstract
Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS) or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data while still maintaining the strong generalization ability of the original 2D diffusion model filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference multi-view normal maps are generated using the 2.5D diffusion and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse mode-seeking-free and high-fidelity 3D content generation in only 10 seconds.
Cite
Text
Lu et al. "Direct2.5: Diverse Text-to-3D Generation via Multi-View 2.5d Diffusion." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00835Markdown
[Lu et al. "Direct2.5: Diverse Text-to-3D Generation via Multi-View 2.5d Diffusion." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/lu2024cvpr-direct2/) doi:10.1109/CVPR52733.2024.00835BibTeX
@inproceedings{lu2024cvpr-direct2,
title = {{Direct2.5: Diverse Text-to-3D Generation via Multi-View 2.5d Diffusion}},
author = {Lu, Yuanxun and Zhang, Jingyang and Li, Shiwei and Fang, Tian and McKinnon, David and Tsin, Yanghai and Quan, Long and Cao, Xun and Yao, Yao},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {8744-8753},
doi = {10.1109/CVPR52733.2024.00835},
url = {https://mlanthology.org/cvpr/2024/lu2024cvpr-direct2/}
}