Director3D: Real-World Camera Trajectory and 3D Scene Generation from Text

Abstract

Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined camera trajectories. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories. To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the \emph{Cinematographer}, to model the distribution of camera trajectories based on textual descriptions. Next, a Gaussian-driven Multi-view Latent Diffusion Model serves as the \emph{Decorator}, modeling the image sequence distribution given the camera trajectories and texts. This model, fine-tuned from a 2D diffusion model, directly generates pixel-aligned 3D Gaussians as an immediate 3D scene representation for consistent denoising. Lastly, the 3D Gaussians are further refined by a novel SDS++ loss as the \emph{Detailer}, which incorporates the prior of the 2D diffusion model. Extensive experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.

Cite

Text

Li et al. "Director3D: Real-World Camera Trajectory and 3D Scene Generation from Text." Neural Information Processing Systems, 2024. doi:10.52202/079017-2391

Markdown

[Li et al. "Director3D: Real-World Camera Trajectory and 3D Scene Generation from Text." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/li2024neurips-director3d/) doi:10.52202/079017-2391

BibTeX

@inproceedings{li2024neurips-director3d,
  title     = {{Director3D: Real-World Camera Trajectory and 3D Scene Generation from Text}},
  author    = {Li, Xinyang and Lai, Zhangyu and Xu, Linning and Qu, Yansong and Cao, Liujuan and Zhang, Shengchuan and Dai, Bo and Ji, Rongrong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2391},
  url       = {https://mlanthology.org/neurips/2024/li2024neurips-director3d/}
}