Uniform Text-Motion Generation and Editing via Diffusion Model
Abstract
Diffusion excels in controllable generation for continuous modalities, ideal for continuous motion generation. However, its flexibility is limited, focusing solely on text-to-motion generation and lacking motion editing capabilities. To address these issues, we introduce UniTMGE, a uniform text-motion generation and editing framework based on diffusion. UniTMGE overcomes single-modality limitations, enabling efficient and effective performance across multiple tasks like text-driven motion generation, motion captioning, motion completion, and multi-modal motion editing. UniTMGE comprises three components: CTMV for mapping text and motion into a shared latent space using contrastive learning, a controllable diffusion model customized for the CTMV space, and MCRE for unifying multimodal conditions into CLIP representations, enabling precise multimodal control and flexible motion editing through simple linear operations. We conducted both closed-world experiments and open-world experiments using the Motion-X dataset with detailed text descriptions, with results demonstrating our model's effectiveness and generalizability across multiple tasks.
Cite
Text
Wang et al. "Uniform Text-Motion Generation and Editing via Diffusion Model." NeurIPS 2024 Workshops: AFM, 2024.Markdown
[Wang et al. "Uniform Text-Motion Generation and Editing via Diffusion Model." NeurIPS 2024 Workshops: AFM, 2024.](https://mlanthology.org/neuripsw/2024/wang2024neuripsw-uniform/)BibTeX
@inproceedings{wang2024neuripsw-uniform,
title = {{Uniform Text-Motion Generation and Editing via Diffusion Model}},
author = {Wang, Ruoyu and Li, Xiang and Sun, Tengjiao and He, Yangfan and Shi, Tianyu and Yitingxie, },
booktitle = {NeurIPS 2024 Workshops: AFM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/wang2024neuripsw-uniform/}
}