RealPortrait: Realistic Portrait Animation with Diffusion Transformers
Abstract
We introduce RealPortrait, a framework based on Diffusion Transformers (DiT), designed to generate highly expressive and visually appealing portrait animations. Given a static portrait image, our method can transfer complex facial expressions and head pose movements extracted from a driving video onto the portrait, transforming it into a lifelike video. Specifically, we exploit the robust spatial-temporal modeling capabilities of DiT, enabling the generation of portrait videos that maintain high-fidelity visual details and ensure temporal coherence. In contrast to conventional image-to-video generation frameworks that necessitate a separate reference network, we incorporate an efficient reference attention within the DiT backbone, thereby obviating the computational overhead and achieving superior reference appearance preservation. Concurrently, we integrate a parallel ControlNet to precisely regulate intricate facial expressions and head poses. Diverging from prior methods that utilize explicit sparse motion representations, such as facial landmarks or 3DMM coefficients, we adopt a dense implicit motion representation as the control guidance. This implicit motion representation excels in capturing nuanced emotional facial expressions and subtle non-rigid dynamics of the lips. To further enhance the generalization capability of the model, we augment the training dataset by incorporating a substantial volume of facial image data through random crop augmentation. This strategy ensures the model's robustness across a wide variety of facial appearances and expressions. Empirical evaluations demonstrate that RealPortrait excels in generating portrait animations with highly-realistic quality and exceptional temporal coherence in appearance retention.
Cite
Text
Yang et al. "RealPortrait: Realistic Portrait Animation with Diffusion Transformers." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I9.33012Markdown
[Yang et al. "RealPortrait: Realistic Portrait Animation with Diffusion Transformers." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/yang2025aaai-realportrait/) doi:10.1609/AAAI.V39I9.33012BibTeX
@inproceedings{yang2025aaai-realportrait,
title = {{RealPortrait: Realistic Portrait Animation with Diffusion Transformers}},
author = {Yang, Zejun and Wei, Huawei and Wang, Zhisheng},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {9345-9353},
doi = {10.1609/AAAI.V39I9.33012},
url = {https://mlanthology.org/aaai/2025/yang2025aaai-realportrait/}
}