AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Abstract

Recently, text-to-motion models open new possibilities for creating realistic human motion with greater efficiency and flexibility. However, aligning motion generation with event-level textual descriptions presents unique challenges due to the complex, nuanced relationship between textual prompts and desired motion outcomes. To address this issue, we introduce AToM, a framework that enhances the alignment between generated motion and text prompts by leveraging reward from GPT-4Vision. AToM comprises three main stages: Firstly, we construct a dataset MotionPrefer that pairs three types of event-level textual prompts with generated motions, which cover the integrity, temporal relationship and the frequency of motion. Secondly, we design a paradigm that utilizes GPT-4Vision for detailed motion annotation, including visual data formatting, task-specific instructions and scoring rules for each sub-task. Finally, we fine-tune an existing text-to-motion model using reinforcement learning guided by this paradigm. Experimental results demonstrate that AToM significantly improves the event-level alignment quality of text-to-motion generation.

Cite

Text

Han et al. "AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02118

Markdown

[Han et al. "AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/han2025cvpr-atom/) doi:10.1109/CVPR52734.2025.02118

BibTeX

@inproceedings{han2025cvpr-atom,
  title     = {{AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward}},
  author    = {Han, Haonan and Wu, Xiangzuo and Liao, Huan and Xu, Zunnan and Hu, Zhongyuan and Li, Ronghui and Zhang, Yachao and Li, Xiu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {22746-22755},
  doi       = {10.1109/CVPR52734.2025.02118},
  url       = {https://mlanthology.org/cvpr/2025/han2025cvpr-atom/}
}