Go to Zero: Towards Zero-Shot Motion Generation with Million-Scale Data

ICCV 2025 pp. 13336-13348

Abstract

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion--the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation.

Cite

Text

Fan et al. "Go to Zero: Towards Zero-Shot Motion Generation with Million-Scale Data." International Conference on Computer Vision, 2025.

Markdown

[Fan et al. "Go to Zero: Towards Zero-Shot Motion Generation with Million-Scale Data." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/fan2025iccv-go/)

BibTeX

@inproceedings{fan2025iccv-go,
  title     = {{Go to Zero: Towards Zero-Shot Motion Generation with Million-Scale Data}},
  author    = {Fan, Ke and Lu, Shunlin and Dai, Minyue and Yu, Runyi and Xiao, Lixing and Dou, Zhiyang and Dong, Junting and Ma, Lizhuang and Wang, Jingbo},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {13336-13348},
  url       = {https://mlanthology.org/iccv/2025/fan2025iccv-go/}
}