SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features
Abstract
Generating high-fidelity talking heads that maintain stable head poses and achieve robust lip sync remains a significant challenge. Although methods based on 3D Gaussian Splatting (3DGS) offer a promising solution via point-based deformation, they suffer from inconsistent head dynamics and mismatched mouth movements due to unstable Gaussian initialization and incomplete speech features. To overcome these limitations, we introduce SyncGaussian, a 3DGS-based framework that ensures stable head poses, enhanced lip sync, and realistic appearances with real-time rendering. SyncGaussian employs a stable head Gaussian initialization strategy to mitigate head jitter by optimizing commonly used rough head pose parameters. To enhance lip sync, we propose a sync-enhanced encoder that leverages audio-to-text and audio-to-visual speech features. Guided by a tailored cosine similarity loss function, the encoder integrates discriminative speech features through a multi-level sync adaptation mechanism, enabling the learning of an adaptive speech feature space. Extensive experiments demonstrate that SyncGaussian outperforms state-of-the-art methods in image quality, dynamic motion, and lip sync, with the potential for real-time applications.
Cite
Text
Liu et al. "SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/176Markdown
[Liu et al. "SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/liu2025ijcai-syncgaussian/) doi:10.24963/IJCAI.2025/176BibTeX
@inproceedings{liu2025ijcai-syncgaussian,
title = {{SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features}},
author = {Liu, Ke and Wei, Jiwei and He, Shiyuan and Ma, Zeyu and Zhang, Chaoning and Xie, Ning and Yang, Yang},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {1576-1584},
doi = {10.24963/IJCAI.2025/176},
url = {https://mlanthology.org/ijcai/2025/liu2025ijcai-syncgaussian/}
}