PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Liu, Ying-Tian; Guo, Yuan-Chen; Luo, Guan; Sun, Heyi; Yin, Wei; Zhang, Song-Hai

doi:10.1109/CVPR52733.2024.01882

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Ying-Tian Liu, Yuan-Chen Guo, Guan Luo, Heyi Sun, Wei Yin, Song-Hai Zhang

CVPR 2024 pp. 19915-19924

doi:10.1109/CVPR52733.2024.01882 /cvpr/2024/liu2024cvpr-pi3d/

Abstract

Diffusion models trained on large-scale text-image datasets have demonstrated a strong capability of controllable high-quality image generation from arbitrary text prompts. However the generation quality and generalization ability of 3D diffusion models is hindered by the scarcity of high-quality and large-scale 3D datasets. In this paper we present PI3D a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes. The core idea is to connect the 2D and 3D domains by representing a 3D shape as a set of Pseudo RGB Images. We fine-tune an existing text-to-image diffusion model to produce such pseudo-images using a small number of text-3D pairs. Surprisingly we find that it can already generate meaningful and consistent 3D shapes given complex text descriptions. We further take the generated shapes as the starting point for a lightweight iterative refinement using score distillation sampling to achieve high-quality generation under a low budget. PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.

PDF CVPR Semantic Scholar

Cite

Text

Liu et al. "PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01882

Markdown

[Liu et al. "PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/liu2024cvpr-pi3d/) doi:10.1109/CVPR52733.2024.01882

BibTeX

@inproceedings{liu2024cvpr-pi3d,
  title     = {{PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion}},
  author    = {Liu, Ying-Tian and Guo, Yuan-Chen and Luo, Guan and Sun, Heyi and Yin, Wei and Zhang, Song-Hai},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {19915-19924},
  doi       = {10.1109/CVPR52733.2024.01882},
  url       = {https://mlanthology.org/cvpr/2024/liu2024cvpr-pi3d/}
}