Robust Video Portrait Reenactment via Personalized Representation Quantization

Wang, Kaisiyuan; Liang, Changcheng; Zhou, Hang; Tang, Jiaxiang; Wu, Qianyi; He, Dongliang; Hong, Zhibin; Liu, Jingtuo; Ding, Errui; Liu, Ziwei; Wang, Jingdong

doi:10.1609/AAAI.V37I2.25354

Robust Video Portrait Reenactment via Personalized Representation Quantization

Kaisiyuan Wang, Changcheng Liang, Hang Zhou, Jiaxiang Tang, Qianyi Wu, Dongliang He, Zhibin Hong, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

AAAI 2023 pp. 2564-2572

doi:10.1609/AAAI.V37I2.25354 /aaai/2023/wang2023aaai-robust-a/

Abstract

While progress has been made in the field of portrait reenactment, the problem of how to produce high-fidelity and robust videos remains. Recent studies normally find it challenging to handle rarely seen target poses due to the limitation of source data. This paper proposes the Video Portrait via Non-local Quantization Modeling (VPNQ) framework, which produces pose- and disturbance-robust reenactable video portraits. Our key insight is to learn position-invariant quantized local patch representations and build a mapping between simple driving signals and local textures with non-local spatial-temporal modeling. Specifically, instead of learning a universal quantized codebook, we identify that a personalized one can be trained to preserve desired position-invariant local details better. Then, a simple representation of projected landmarks can be used as sufficient driving signals to avoid 3D rendering. Following, we employ a carefully designed Spatio-Temporal Transformer to predict reasonable and temporally consistent quantized tokens from the driving signal. The predicted codes can be decoded back to robust and high-quality videos. Comprehensive experiments have been conducted to validate the effectiveness of our approach.

PDF AAAI Semantic Scholar

Cite

Text

Wang et al. "Robust Video Portrait Reenactment via Personalized Representation Quantization." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I2.25354

Markdown

[Wang et al. "Robust Video Portrait Reenactment via Personalized Representation Quantization." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/wang2023aaai-robust-a/) doi:10.1609/AAAI.V37I2.25354

BibTeX

@inproceedings{wang2023aaai-robust-a,
  title     = {{Robust Video Portrait Reenactment via Personalized Representation Quantization}},
  author    = {Wang, Kaisiyuan and Liang, Changcheng and Zhou, Hang and Tang, Jiaxiang and Wu, Qianyi and He, Dongliang and Hong, Zhibin and Liu, Jingtuo and Ding, Errui and Liu, Ziwei and Wang, Jingdong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {2564-2572},
  doi       = {10.1609/AAAI.V37I2.25354},
  url       = {https://mlanthology.org/aaai/2023/wang2023aaai-robust-a/}
}