Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model
Abstract
Training a generic 3D face reconstruction model in a self-supervised manner using large-scale, in-the-wild 2D face image datasets enhances robustness to varying lighting conditions and occlusions while allowing the model to capture animatable wrinkle details across diverse facial expressions. However, a generic model often fails to adequately represent the unique characteristics of specific individuals. In this paper, we propose a method to train a generic base model and then transfer it to yield person-specific models by integrating lightweight adapters within the large-parameter ViT-MAE base model. These person-specific models excel at capturing individual facial shapes and detailed features while preserving the robustness and prior knowledge of detail variations from the base model. During training, we introduce a silhouette vertex re-projection loss to address boundary "landmark marching" issues on the 3D face caused by pose variations. Additionally, we employ an innovative teacher-student loss to leverage the inherent strengths of UNet in feature boundary localization for training our detail MAE. Quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance in face alignment, detail accuracy, and richness. The source code is available at https://github.com/danielmao2000/person-specific-animatable-face.
Cite
Text
Mao et al. "Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00526Markdown
[Mao et al. "Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/mao2025cvpr-learning/) doi:10.1109/CVPR52734.2025.00526BibTeX
@inproceedings{mao2025cvpr-learning,
title = {{Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model}},
author = {Mao, Yuxiang and Fan, Zhenfeng and Zhang, ZhiJie and Zhang, Zhiheng and Xia, Shihong},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {5602-5613},
doi = {10.1109/CVPR52734.2025.00526},
url = {https://mlanthology.org/cvpr/2025/mao2025cvpr-learning/}
}