Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Abstract
We introduce Puppet-Master, an interactive video generator that captures the internal, part-level motion of objects, serving as a proxy for modeling object dynamics universally. Given an image of an object and a set of "drags" specifying the trajectory of a few points on the object, the model synthesizes a video where the object's parts move accordingly. To build Puppet-Master, we extend a pre-trained image-to-video generator to encode the input drags. We also propose all-to-first attention, an alternative to conventional spatial attention that mitigates artifacts caused by fine-tuning a video generator on out-of-domain data. The model is fine-tuned on Objaverse-Animation-HQ, a new dataset of curated part-level motion clips obtained by rendering synthetic 3D animations. Unlike real videos, these synthetic clips avoid confounding part-level motion with overall object and camera motion. We extensively filter sub-optimal animations and augment the synthetic renderings with meaningful drags that emphasize the internal dynamics of objects. We demonstrate that Puppet-Master learns to generate part-level motions, unlike other motion-conditioned video generators that primarily move the object as a whole. Moreover, Puppet-Master generalizes well to out-of-domain real images, outperforming existing methods on real-world benchmarks in a zero-shot manner.
Cite
Text
Li et al. "Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics." International Conference on Computer Vision, 2025.Markdown
[Li et al. "Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/li2025iccv-puppetmaster/)BibTeX
@inproceedings{li2025iccv-puppetmaster,
title = {{Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics}},
author = {Li, Ruining and Zheng, Chuanxia and Rupprecht, Christian and Vedaldi, Andrea},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {13405-13415},
url = {https://mlanthology.org/iccv/2025/li2025iccv-puppetmaster/}
}