Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation
Abstract
Talking head video generation involves animating a still face image using facial motion cues derived from a driving video to replicate target poses and expressions. Traditional methods often rely on the assumption that the relative positions of facial keypoints remain unchanged. However, this assumption fails when keypoints are occluded or when the head is in a profile pose, leading to inconsistencies in identity and blurring in certain facial regions. In this paper, we introduce Occlusion-Insensitive Talking Head Video Generation, a novel approach that eliminates the reliance on spatial correlation of keypoints and instead leverages semantic correlation. Our method transforms facial features into a facelet semantic bank, where each facelet token represents a specific facial semantic. This bank is devoid of spatial information, allowing it to compensate for any invisible or occluded face regions during motion warping. The facelet compensation module then populates the facelet tokens within the initially warped features by learning a correlation matrix between facial semantics and the facelet bank. This approach enables precise compensation for occlusions and pose changes, enhancing the fidelity of the generated videos. Extensive experiments demonstrate that our method achieves state-of-the-art results, preserving source identity, maintaining fine-grained facial details, and capturing nuanced facial expressions with remarkable accuracy.
Cite
Text
Deng et al. "Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I3.32277Markdown
[Deng et al. "Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/deng2025aaai-occlusion/) doi:10.1609/AAAI.V39I3.32277BibTeX
@inproceedings{deng2025aaai-occlusion,
title = {{Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation}},
author = {Deng, Yuhui and Lu, Yuqin and Xu, Yangyang and Nie, Yongwei and He, Shengfeng},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {2726-2734},
doi = {10.1609/AAAI.V39I3.32277},
url = {https://mlanthology.org/aaai/2025/deng2025aaai-occlusion/}
}