Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Abstract
Animation of humanoid characters is essential in various graphics applications, but require significant time and cost to create realistic animations. We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes, leveraging strong generalized motion priors from generative video models -- as such video models contain powerful motion information covering a wide variety of human motions. From an input static 3D humanoid mesh and a text prompt describing the desired animation, we synthesize a corresponding video conditioned on a rendered image of the 3D mesh. We then employ an underlying SMPL representation to animate the corresponding 3D mesh according to the video-generated motion, based on our motion optimization. This enables a cost-effective and accessible solution to enable the synthesis of diverse and realistic 4D animations
Cite
Text
Millán et al. "Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models." International Conference on Learning Representations, 2026.Markdown
[Millán et al. "Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/millan2026iclr-animating/)BibTeX
@inproceedings{millan2026iclr-animating,
title = {{Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models}},
author = {Millán, Marc Benedí San and Dai, Angela and Nießner, Matthias},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/millan2026iclr-animating/}
}