Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images
Abstract
Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D face using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.
Cite
Text
Yu et al. "Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01406Markdown
[Yu et al. "Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/yu2023iccv-highfidelity/) doi:10.1109/ICCV51070.2023.01406BibTeX
@inproceedings{yu2023iccv-highfidelity,
title = {{Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images}},
author = {Yu, Cuican and Lu, Guansong and Zeng, Yihan and Sun, Jian and Liang, Xiaodan and Li, Huibin and Xu, Zongben and Xu, Songcen and Zhang, Wei and Xu, Hang},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {15326-15337},
doi = {10.1109/ICCV51070.2023.01406},
url = {https://mlanthology.org/iccv/2023/yu2023iccv-highfidelity/}
}