VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

Abstract

We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions. Our method is real-time and produces high-fidelity and view-consistent output suitable for 3D teleconferencing systems based on holographic displays. Existing cutting-edge 3D-aware reenactment methods often use neural radiance fields or 3D meshes to produce view-consistent appearance encoding but at the same time they rely on linear face models such as 3DMM to achieve its disentanglement with facial expressions. As a result their reenactment results often exhibit identity leakage from the driver or have unnatural expressions. To address these problems we propose a neural self-supervised disentanglement approach that lifts both the source image and driver video frame into a shared 3D volumetric representation based on tri-planes. This representation can then be freely manipulated with expression tri-planes extracted from the driving images and rendered from an arbitrary view using neural radiance fields. We achieve this disentanglement via self-supervised learning on a large in-the-wild video dataset. We further introduce a highly effective fine-tuning approach to improve the generalizability of the 3D lifting using the same real-world data. We demonstrate state-of-the-art performance on a wide range of datasets and also showcase high-quality 3D-aware head reenactment on highly challenging and diverse subjects including non-frontal head poses and complex expressions for both source and driver.

Cite

Text

Tran et al. "VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00984

Markdown

[Tran et al. "VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/tran2024cvpr-voodoo/) doi:10.1109/CVPR52733.2024.00984

BibTeX

@inproceedings{tran2024cvpr-voodoo,
  title     = {{VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment}},
  author    = {Tran, Phong and Zakharov, Egor and Ho, Long-Nhat and Tran, Anh Tuan and Hu, Liwen and Li, Hao},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {10336-10348},
  doi       = {10.1109/CVPR52733.2024.00984},
  url       = {https://mlanthology.org/cvpr/2024/tran2024cvpr-voodoo/}
}