Talking Face Generation by Conditional Recurrent Adversarial Network

Song, Yang; Zhu, Jingwen; Li, Dawei; Wang, Andy; Qi, Hairong

doi:10.24963/IJCAI.2019/129

Talking Face Generation by Conditional Recurrent Adversarial Network

Yang Song, Jingwen Zhu, Dawei Li, Andy Wang, Hairong Qi

IJCAI 2019 pp. 919-925

doi:10.24963/IJCAI.2019/129 /ijcai/2019/song2019ijcai-talking/

Abstract

Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generate the talking face video with accurate lip synchronization. Existing works either do not consider temporal dependency across video frames thus yielding abrupt facial and lip movement or are limited to the generation of talking face video for a specific person thus lacking generalization capacity. We propose a novel conditional recurrent generation network that incorporates both image and audio features in the recurrent unit for temporal dependency. To achieve both image- and video-realism, a pair of spatial-temporal discriminators are included in the network for better image/video quality. Since accurate lip synchronization is essential to the success of talking face video generation, we also construct a lip-reading discriminator to boost the accuracy of lip synchronization. We also extend the network to model the natural pose and expression of talking face on the Obama Dataset. Extensive experimental results demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition pertaining to both lip and facial movement.

PDF IJCAI Semantic Scholar

Cite

Text

Song et al. "Talking Face Generation by Conditional Recurrent Adversarial Network." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/129

Markdown

[Song et al. "Talking Face Generation by Conditional Recurrent Adversarial Network." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/song2019ijcai-talking/) doi:10.24963/IJCAI.2019/129

BibTeX

@inproceedings{song2019ijcai-talking,
  title     = {{Talking Face Generation by Conditional Recurrent Adversarial Network}},
  author    = {Song, Yang and Zhu, Jingwen and Li, Dawei and Wang, Andy and Qi, Hairong},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {919-925},
  doi       = {10.24963/IJCAI.2019/129},
  url       = {https://mlanthology.org/ijcai/2019/song2019ijcai-talking/}
}