FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Abstract

Speech-driven 3D facial animation is challenging due to the complex geometry of human faces and the limited availability of 3D audio-visual data. Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. To cope with the data scarcity issue, we integrate the self-supervised pre-trained speech representations. Also, we devise two biased attention mechanisms well suited to this specific task, including the biased cross-modal multi-head (MH) attention and the biased causal MH self-attention with a periodic positional encoding strategy. The former effectively aligns the audio-motion modalities, whereas the latter offers abilities to generalize to longer audio sequences. Extensive experiments and a perceptual user study show that our approach outperforms the existing state-of-the-arts. The code and the video are available at: https://evelynfan.github.io/audio2face/.

Cite

Text

Fan et al. "FaceFormer: Speech-Driven 3D Facial Animation with Transformers." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01821

Markdown

[Fan et al. "FaceFormer: Speech-Driven 3D Facial Animation with Transformers." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/fan2022cvpr-faceformer/) doi:10.1109/CVPR52688.2022.01821

BibTeX

@inproceedings{fan2022cvpr-faceformer,
  title     = {{FaceFormer: Speech-Driven 3D Facial Animation with Transformers}},
  author    = {Fan, Yingruo and Lin, Zhaojiang and Saito, Jun and Wang, Wenping and Komura, Taku},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {18770-18780},
  doi       = {10.1109/CVPR52688.2022.01821},
  url       = {https://mlanthology.org/cvpr/2022/fan2022cvpr-faceformer/}
}