Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Abstract
Generating high-quality 3D objects from textual descriptions remains a challenging problem due to high computational costs, the scarcity of 3D data, and the complexity of 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models, such as Stable Diffusion, to achieve strong generalization despite limited 3D training data. This allows us to use only high-quality training data while retaining compatibility with guidance techniques such as IPAdapter. GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models, without being restricted to manifold meshes during either training or inference. We simultaneously generate a UV unwrapping for the objects, consisting of semantically meaningful parts as well as internal structures, enhancing both usability and versatility.
Cite
Text
Elizarov et al. "Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation." International Conference on Learning Representations, 2025.Markdown
[Elizarov et al. "Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/elizarov2025iclr-geometry/)BibTeX
@inproceedings{elizarov2025iclr-geometry,
title = {{Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation}},
author = {Elizarov, Slava and Rowles, Ciara and Donné, Simon},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/elizarov2025iclr-geometry/}
}