Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model

Abstract

Training a generic 3D face reconstruction model in a self-supervised manner using large-scale, in-the-wild 2D face image datasets enhances robustness to varying lighting conditions and occlusions while allowing the model to capture animatable wrinkle details across diverse facial expressions. However, a generic model often fails to adequately represent the unique characteristics of specific individuals. In this paper, we propose a method to train a generic base model and then transfer it to yield person-specific models by integrating lightweight adapters within the large-parameter ViT-MAE base model. These person-specific models excel at capturing individual facial shapes and detailed features while preserving the robustness and prior knowledge of detail variations from the base model. During training, we introduce a silhouette vertex re-projection loss to address boundary "landmark marching" issues on the 3D face caused by pose variations. Additionally, we employ an innovative teacher-student loss to leverage the inherent strengths of UNet in feature boundary localization for training our detail MAE. Quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance in face alignment, detail accuracy, and richness. The source code is available at https://github.com/danielmao2000/person-specific-animatable-face.

Cite

Text

Mao et al. "Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00526

Markdown

[Mao et al. "Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/mao2025cvpr-learning/) doi:10.1109/CVPR52734.2025.00526

BibTeX

@inproceedings{mao2025cvpr-learning,
  title     = {{Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model}},
  author    = {Mao, Yuxiang and Fan, Zhenfeng and Zhang, ZhiJie and Zhang, Zhiheng and Xia, Shihong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {5602-5613},
  doi       = {10.1109/CVPR52734.2025.00526},
  url       = {https://mlanthology.org/cvpr/2025/mao2025cvpr-learning/}
}