Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images

Yu, Cuican; Lu, Guansong; Zeng, Yihan; Sun, Jian; Liang, Xiaodan; Li, Huibin; Xu, Zongben; Xu, Songcen; Zhang, Wei; Xu, Hang

doi:10.1109/ICCV51070.2023.01406

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images

Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei Zhang, Hang Xu

ICCV 2023 pp. 15326-15337

doi:10.1109/ICCV51070.2023.01406 /iccv/2023/yu2023iccv-highfidelity/

Abstract

Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D face using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.

PDF ICCV Semantic Scholar

Cite

Text

Yu et al. "Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01406

Markdown

[Yu et al. "Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/yu2023iccv-highfidelity/) doi:10.1109/ICCV51070.2023.01406

BibTeX

@inproceedings{yu2023iccv-highfidelity,
  title     = {{Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images}},
  author    = {Yu, Cuican and Lu, Guansong and Zeng, Yihan and Sun, Jian and Liang, Xiaodan and Li, Huibin and Xu, Zongben and Xu, Songcen and Zhang, Wei and Xu, Hang},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {15326-15337},
  doi       = {10.1109/ICCV51070.2023.01406},
  url       = {https://mlanthology.org/iccv/2023/yu2023iccv-highfidelity/}
}