Text/Speech-Driven Full-Body Animation

Abstract

Due to the increasing demand in films and games, synthesizing 3D avatar animation has attracted much attention recently. In this work, we present a production-ready text/speech-driven full-body animation synthesis system. Given the text and corresponding speech, our system synthesizes face and body animations simultaneously, which are then skinned and rendered to obtain a video stream output. We adopt a learning-based approach for synthesizing facial animation and a graph-based approach to animate the body, which generates high-quality avatar animation efficiently and robustly. Our results demonstrate the generated avatar animations are realistic, diverse and highly text/speech-correlated.

Cite

Text

Zhuang et al. "Text/Speech-Driven Full-Body Animation." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/863

Markdown

[Zhuang et al. "Text/Speech-Driven Full-Body Animation." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/zhuang2022ijcai-text/) doi:10.24963/IJCAI.2022/863

BibTeX

@inproceedings{zhuang2022ijcai-text,
  title     = {{Text/Speech-Driven Full-Body Animation}},
  author    = {Zhuang, Wenlin and Qi, Jinwei and Zhang, Peng and Zhang, Bang and Tan, Ping},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {5956-5959},
  doi       = {10.24963/IJCAI.2022/863},
  url       = {https://mlanthology.org/ijcai/2022/zhuang2022ijcai-text/}
}