ATT3D: Amortized Text-to-3D Object Synthesis

Abstract

Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework, Amortized Text-to-3D (ATT3D), enables knowledge sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.

Cite

Text

Lorraine et al. "ATT3D: Amortized Text-to-3D Object Synthesis." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01645

Markdown

[Lorraine et al. "ATT3D: Amortized Text-to-3D Object Synthesis." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/lorraine2023iccv-att3d/) doi:10.1109/ICCV51070.2023.01645

BibTeX

@inproceedings{lorraine2023iccv-att3d,
  title     = {{ATT3D: Amortized Text-to-3D Object Synthesis}},
  author    = {Lorraine, Jonathan and Xie, Kevin and Zeng, Xiaohui and Lin, Chen-Hsuan and Takikawa, Towaki and Sharp, Nicholas and Lin, Tsung-Yi and Liu, Ming-Yu and Fidler, Sanja and Lucas, James},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {17946-17956},
  doi       = {10.1109/ICCV51070.2023.01645},
  url       = {https://mlanthology.org/iccv/2023/lorraine2023iccv-att3d/}
}