MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Abstract
Whole-body multimodal motion generation, controlled by text, speech, or music, has numerous applications including video generation and character animation. However, employing a unified model to process different condition modalities presents two main challenges: motion distribution drifts across different tasks (e.g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e.g., text and audio). In this paper, we propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control. Our framework employs a coarse-to-fine training strategy, starting with the text-to-motion semantic pre-training, followed by the multimodal low-level control adaptation. To effectively learn and transfer motion knowledge across different distributions, we design MC-Attn for parallel modeling of static and dynamic human topology graphs. To overcome the motion format inconsistency of existing benchmarks, we introduce MC-Bench, the first available multimodal whole-body motion generation benchmark based on the unified SMPL-X format. Extensive experiments show that MotionCraft achieves state-of-the-art performance on various standard motion generation tasks.
Cite
Text
Bian et al. "MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I2.32183Markdown
[Bian et al. "MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/bian2025aaai-motioncraft/) doi:10.1609/AAAI.V39I2.32183BibTeX
@inproceedings{bian2025aaai-motioncraft,
title = {{MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls}},
author = {Bian, Yuxuan and Zeng, Ailing and Ju, Xuan and Liu, Xian and Zhang, Zhaoyang and Liu, Wei and Xu, Qiang},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {1880-1888},
doi = {10.1609/AAAI.V39I2.32183},
url = {https://mlanthology.org/aaai/2025/bian2025aaai-motioncraft/}
}