Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Abstract

Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built from a single reference image but fail to faithfully preserve the user's per-frame appearance (e.g., instantaneous facial expressions and lighting). As a result, neither of these two frameworks is an ideal solution for democratized 3D telepresence. In this work, we address this dilemma and propose a novel solution that maintains both coherent identity and dynamic per-frame appearance to enable the best possible realism. To this end, we propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views, producing temporally stable 3D videos with faithful reconstruction of the user's per-frame appearance. Trained only using synthetic data produced by an expression-conditioned 3D GAN, our encoder-based method achieves both state-of-the-art 3D reconstruction and temporal consistency on in-studio and in-the-wild datasets.

Cite

Text

Wang et al. "Coherent 3D Portrait Video Reconstruction via Triplane Fusion." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01001

Markdown

[Wang et al. "Coherent 3D Portrait Video Reconstruction via Triplane Fusion." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/wang2025cvpr-coherent/) doi:10.1109/CVPR52734.2025.01001

BibTeX

@inproceedings{wang2025cvpr-coherent,
  title     = {{Coherent 3D Portrait Video Reconstruction via Triplane Fusion}},
  author    = {Wang, Shengze and Li, Xueting and Liu, Chao and Chan, Matthew and Stengel, Michael and Fuchs, Henry and De Mello, Shalini and Nagano, Koki},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {10712-10722},
  doi       = {10.1109/CVPR52734.2025.01001},
  url       = {https://mlanthology.org/cvpr/2025/wang2025cvpr-coherent/}
}