MoCha: Towards Movie-Grade Talking Character Generation
Abstract
Recent advancements in video generation have achieved impressive motion realism, yet they often overlook character-driven storytelling, a crucial task for automated film, animation generation. We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text. Unlike talking head tasks, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region. In this paper, we propose MoCha, the first of its kind to generate talking characters. To ensure precise synchronization between video and speech, we propose a localized audio attention mechanism that effectively aligns speech and video tokens. To address the scarcity of large-scale speech-labelled video datasets, we introduce a joint training strategy that leverages both speech-labelled and text-labelled video data, significantly improving generalization across diverse character actions. We also design structured prompt templates with character tags, enabling, for the first time, multi-character conversation with turn-based dialogue—allowing AI-generated characters to engage in context-aware conversations with cinematic coherence. Extensive qualitative and quantitative evaluations, including human evaluation studies and benchmark comparisons, demonstrate that MoCha sets a new standard for AI-generated cinematic storytelling, achieving superior realism, controllability and generalization.
Cite
Text
Wei et al. "MoCha: Towards Movie-Grade Talking Character Generation." Advances in Neural Information Processing Systems, 2025.Markdown
[Wei et al. "MoCha: Towards Movie-Grade Talking Character Generation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wei2025neurips-mocha/)BibTeX
@inproceedings{wei2025neurips-mocha,
title = {{MoCha: Towards Movie-Grade Talking Character Generation}},
author = {Wei, Cong and Sun, Bo and Ma, Haoyu and Hou, Ji and Juefei-Xu, Felix and He, Zecheng and Dai, Xiaoliang and Zhang, Luxin and Li, Kunpeng and Hou, Tingbo and Sinha, Animesh and Vajda, Peter and Chen, Wenhu},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/wei2025neurips-mocha/}
}