SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features

Ke Liu, Jiwei Wei, Shiyuan He, Zeyu Ma, Chaoning Zhang, Ning Xie, Yang Yang

IJCAI 2025 pp. 1576-1584

doi:10.24963/IJCAI.2025/176 /ijcai/2025/liu2025ijcai-syncgaussian/

Abstract

Generating high-fidelity talking heads that maintain stable head poses and achieve robust lip sync remains a significant challenge. Although methods based on 3D Gaussian Splatting (3DGS) offer a promising solution via point-based deformation, they suffer from inconsistent head dynamics and mismatched mouth movements due to unstable Gaussian initialization and incomplete speech features. To overcome these limitations, we introduce SyncGaussian, a 3DGS-based framework that ensures stable head poses, enhanced lip sync, and realistic appearances with real-time rendering. SyncGaussian employs a stable head Gaussian initialization strategy to mitigate head jitter by optimizing commonly used rough head pose parameters. To enhance lip sync, we propose a sync-enhanced encoder that leverages audio-to-text and audio-to-visual speech features. Guided by a tailored cosine similarity loss function, the encoder integrates discriminative speech features through a multi-level sync adaptation mechanism, enabling the learning of an adaptive speech feature space. Extensive experiments demonstrate that SyncGaussian outperforms state-of-the-art methods in image quality, dynamic motion, and lip sync, with the potential for real-time applications.

PDF IJCAI Semantic Scholar

Cite

Text

Liu et al. "SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/176

Markdown

[Liu et al. "SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/liu2025ijcai-syncgaussian/) doi:10.24963/IJCAI.2025/176

BibTeX

@inproceedings{liu2025ijcai-syncgaussian,
  title     = {{SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features}},
  author    = {Liu, Ke and Wei, Jiwei and He, Shiyuan and Ma, Zeyu and Zhang, Chaoning and Xie, Ning and Yang, Yang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1576-1584},
  doi       = {10.24963/IJCAI.2025/176},
  url       = {https://mlanthology.org/ijcai/2025/liu2025ijcai-syncgaussian/}
}