High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding

Zuo, Qi; Gu, Xiaodong; Dong, Yuan; Zhao, Zhengyi; Yuan, Weihao; Lingteng, Qiu; Bo, Liefeng; Dong, Zilong

doi:10.1007/978-3-031-72684-2_4

High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding

Qi Zuo, Xiaodong Gu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Qiu Lingteng, Liefeng Bo, Zilong Dong

ECCV 2024

doi:10.1007/978-3-031-72684-2_4 /eccv/2024/zuo2024eccv-highfidelity/

Abstract

3D vision is inherently characterized by sparse spatial structures, which propels the necessity for an efficient paradigm tailored to 3D generation. Another discrepancy is the amount of training data, which undeniably affects generalization if we only use limited 3D data. To solve these, we design a 3D generation framework that maintains most of the building blocks of StableDiffusion with minimal adaptations for textured shape generation. We design a Sparse Encoding Module for details preservation and an Adversarial Decoding Module for better shape recovery. Moreover, we clean up data and build a benchmark on the biggest 3D dataset (Objaverse). We drop the concept of ‘specific class’ and treat the 3D Textured Shapes Generation as an open-vocabulary problem. We first validate our network design on ShapeNetV2 with 55K samples on single-class unconditional generation and multi-class conditional generation tasks. Then we report metrics on processed G-Objaverse with 200K samples on the image conditional generation task. Extensive experiments demonstrate our proposal outperforms SOTA methods and takes a further step towards open-vocabulary 3D generation. We release the processed data at https://aigc3d.github.io/gobjaverse/.

PDF ECCV Semantic Scholar

Cite

Text

Zuo et al. "High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72684-2_4

Markdown

[Zuo et al. "High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zuo2024eccv-highfidelity/) doi:10.1007/978-3-031-72684-2_4

BibTeX

@inproceedings{zuo2024eccv-highfidelity,
  title     = {{High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding}},
  author    = {Zuo, Qi and Gu, Xiaodong and Dong, Yuan and Zhao, Zhengyi and Yuan, Weihao and Lingteng, Qiu and Bo, Liefeng and Dong, Zilong},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72684-2_4},
  url       = {https://mlanthology.org/eccv/2024/zuo2024eccv-highfidelity/}
}