Motion-Aligned Word Embeddings for Text-to-Motion Generation
Abstract
Existing text-to-motion (T2M) generation models typically rely on pretrained large language models to encode textual inputs. However, these models, trained on generic text corpora, lack explicit alignment between motion-related words (e.g., "clockwise'', "quickly'') and human skeletal movements. This misalignment, fundamentally rooted in the word embedding layers, severely limits the ability of T2M models to understand and generalize fine-grained motion semantics. To tackle this issue, we propose Motion-Aligned Text Encoding (MATE), a novel framework that explicitly incorporates motion semantics into the word embedding layers of large language models to enhance text-motion alignment for motion generation. To address the challenge of inherent semantic entanglement in motion sequences, MATE introduces two key components: 1) a motion localization strategy that establishes localized correspondences between sub-texts and motion segments, enabling soft attention guidance for semantic localization; and 2) a motion disentanglement module that isolates word-specific motion semantics via contrastive kinematic prototypes, ensuring word-level alignment between linguistic and kinematic representations. Remarkably, language models enhanced with MATE can be seamlessly integrated into existing T2M methods, significantly surpassing state-of-the-art performance on two standard benchmarks with minimal modifications.
Cite
Text
Han et al. "Motion-Aligned Word Embeddings for Text-to-Motion Generation." International Conference on Learning Representations, 2026.Markdown
[Han et al. "Motion-Aligned Word Embeddings for Text-to-Motion Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/han2026iclr-motionaligned/)BibTeX
@inproceedings{han2026iclr-motionaligned,
title = {{Motion-Aligned Word Embeddings for Text-to-Motion Generation}},
author = {Han, Ke and Lyu, Yueming and Sebe, Nicu},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/han2026iclr-motionaligned/}
}