FML: Face Model Learning from Videos

Abstract

Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. Our face model is learned using only corpora of in-the-wild video clips collected from the Internet. This virtually endless source of training data enables learning of a highly general 3D face model. In order to achieve this, we propose a novel multi-frame consistency loss that ensures consistent shape and appearance across multiple frames of a subject's face, thus minimizing depth ambiguity. At test time we can use an arbitrary number of frames, so that we can perform both monocular as well as multi-frame reconstruction.

Cite

Text

Tewari et al. "FML: Face Model Learning from Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.01107

Markdown

[Tewari et al. "FML: Face Model Learning from Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/tewari2019cvpr-fml/) doi:10.1109/CVPR.2019.01107

BibTeX

@inproceedings{tewari2019cvpr-fml,
  title     = {{FML: Face Model Learning from Videos}},
  author    = {Tewari, Ayush and Bernard, Florian and Garrido, Pablo and Bharaj, Gaurav and Elgharib, Mohamed and Seidel, Hans-Peter and Perez, Patrick and Zollhofer, Michael and Theobalt, Christian},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.01107},
  url       = {https://mlanthology.org/cvpr/2019/tewari2019cvpr-fml/}
}