MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation Across Multiple Granularities

Abstract

Recent motion-aware large language models have demonstrated promising potential in unifying motion comprehension and generation. However, existing approaches primarily focus on coarse-grained motion-text modeling, where text describes the overall semantics of an entire motion sequence in just a few words. This limits their ability to handle fine-grained motion-relevant tasks, such as understanding and controlling the movements of specific body parts. To overcome this limitation, we pioneer MG-MotionLLM, a unified motion-language model for multi-granular motion comprehension and generation. We further introduce a comprehensive multi-granularity training scheme by incorporating a set of novel auxiliary tasks, such as localizing temporal boundaries of motion segments via detailed text as well as motion detailed captioning, to facilitate mutual reinforcement for motion-text modeling across various levels of granularity. Extensive experiments show that our MG-MotionLLM achieves superior performance on classical text-to-motion and motion-to-text tasks, and exhibits potential in novel fine-grained motion comprehension and editing tasks. Project page: CVI-SZU/MG-MotionLLM

Cite

Text

Wu et al. "MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation Across Multiple Granularities." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02593

Markdown

[Wu et al. "MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation Across Multiple Granularities." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/wu2025cvpr-mgmotionllm/) doi:10.1109/CVPR52734.2025.02593

BibTeX

@inproceedings{wu2025cvpr-mgmotionllm,
  title     = {{MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation Across Multiple Granularities}},
  author    = {Wu, Bizhu and Xie, Jinheng and Shen, Keming and Kong, Zhe and Ren, Jianfeng and Bai, Ruibin and Qu, Rong and Shen, Linlin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {27849-27858},
  doi       = {10.1109/CVPR52734.2025.02593},
  url       = {https://mlanthology.org/cvpr/2025/wu2025cvpr-mgmotionllm/}
}