ATT3D: Amortized Text-to-3D Object Synthesis
Abstract
Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework, Amortized Text-to-3D (ATT3D), enables knowledge sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.
Cite
Text
Lorraine et al. "ATT3D: Amortized Text-to-3D Object Synthesis." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01645Markdown
[Lorraine et al. "ATT3D: Amortized Text-to-3D Object Synthesis." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/lorraine2023iccv-att3d/) doi:10.1109/ICCV51070.2023.01645BibTeX
@inproceedings{lorraine2023iccv-att3d,
title = {{ATT3D: Amortized Text-to-3D Object Synthesis}},
author = {Lorraine, Jonathan and Xie, Kevin and Zeng, Xiaohui and Lin, Chen-Hsuan and Takikawa, Towaki and Sharp, Nicholas and Lin, Tsung-Yi and Liu, Ming-Yu and Fidler, Sanja and Lucas, James},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {17946-17956},
doi = {10.1109/ICCV51070.2023.01645},
url = {https://mlanthology.org/iccv/2023/lorraine2023iccv-att3d/}
}