Disentangled Clothed Avatar Generation from Text Descriptions

Abstract

In this paper, we introduce a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements—clothes, hair, and body—into a single 3D representation. Such an entangled approach poses challenges for downstream tasks like editing or animation. To overcome these limitations, we propose a novel disentangled 3D avatar representation named Sequentially Offset-SMPL (SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and clothes with two separate meshes but associates them with offsets to ensure the physical alignment between the body and the clothes. Then, we design a Score Distillation Sampling (SDS)-based distillation framework to generate the proposed SO-SMPL representation from text prompts. Our approach not only achieves higher texture and geometry quality and better semantic alignment with text prompts, but also significantly improves the visual quality of character animation, virtual try-on, and avatar editing. Project page: this link.

Cite

Text

Wang et al. "Disentangled Clothed Avatar Generation from Text Descriptions." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72943-0_22

Markdown

[Wang et al. "Disentangled Clothed Avatar Generation from Text Descriptions." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wang2024eccv-disentangled/) doi:10.1007/978-3-031-72943-0_22

BibTeX

@inproceedings{wang2024eccv-disentangled,
  title     = {{Disentangled Clothed Avatar Generation from Text Descriptions}},
  author    = {Wang, Jionghao and Liu, Yuan and Dou, Zhiyang and Yu, Zhengming and Liang, Yongqing and Lin, Cheng and Xie, Rong and Song, Li and Li, Xin and Wang, Wenping},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72943-0_22},
  url       = {https://mlanthology.org/eccv/2024/wang2024eccv-disentangled/}
}