MimicMotion: High-Quality Human Motion Video Generation with Confidence-Aware Pose Guidance

Abstract

In recent years, while generative AI has advanced significantly in image generation, video generation continues to face challenges in controllability, length, and detail quality, which hinder its application. We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. Our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which reduces image distortion in key regions. Lastly, we propose a progressive latent fusion strategy to generate long and smooth videos. Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at https://tencent.github.io/MimicMotion.

Cite

Text

Zhang et al. "MimicMotion: High-Quality Human Motion Video Generation with Confidence-Aware Pose Guidance." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "MimicMotion: High-Quality Human Motion Video Generation with Confidence-Aware Pose Guidance." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-mimicmotion/)

BibTeX

@inproceedings{zhang2025icml-mimicmotion,
  title     = {{MimicMotion: High-Quality Human Motion Video Generation with Confidence-Aware Pose Guidance}},
  author    = {Zhang, Yuang and Gu, Jiaxi and Wang, Li-Wen and Wang, Han and Cheng, Junqi and Zhu, Yuefeng and Zou, Fangyuan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {74896-74910},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-mimicmotion/}
}