Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Seo, Junyoung; Jang, Wooseok; Kwak, Min-Seop; Kim, Hyeonsu; Ko, Jaehoon; Kim, Junho; Kim, Jin-Hwa; Lee, Jiyoung; Kim, Seungryong

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim

ICLR 2024

/iclr/2024/seo2024iclr-let/

Abstract

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation sampling (SDS), a methodology of using pretrained text-to-2D diffusion models to optimize a neural radiance field (NeRF) in a zero-shot setting. However, the lack of 3D awareness in the 2D diffusion model often destabilizes previous methods from generating a plausible 3D scene. To address this issue, we propose 3DFuse, a novel framework that incorporates 3D awareness into the pretrained 2D diffusion model, enhancing the robustness and 3D consistency of score distillation-based methods. Specifically, we introduce a consistency injection module which constructs a 3D point cloud from the text prompt and utilizes its projected depth map at given view as a condition for the diffusion model. The 2D diffusion model, through its generative capability, robustly infers dense structure from the sparse point cloud depth map and generates a geometrically consistent and coherent 3D scene. We also introduce a new technique called semantic coding that reduces semantic ambiguity of the text prompt for improved results. Our method can be easily adapted to various text-to-3D baselines, and we experimentally demonstrate how our method notably enhances the 3D consistency of generated scenes in comparison to previous baselines, achieving state-of-the-art performance in geometric robustness and fidelity.

PDF ICLR Semantic Scholar

Cite

Text

Seo et al. "Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation." International Conference on Learning Representations, 2024.

Markdown

[Seo et al. "Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/seo2024iclr-let/)

BibTeX

@inproceedings{seo2024iclr-let,
  title     = {{Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation}},
  author    = {Seo, Junyoung and Jang, Wooseok and Kwak, Min-Seop and Kim, Hyeonsu and Ko, Jaehoon and Kim, Junho and Kim, Jin-Hwa and Lee, Jiyoung and Kim, Seungryong},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/seo2024iclr-let/}
}