Generating Human Interaction Motions in Scenes with Text Control

Yi, Hongwei; Thies, Justus; Black, Michael J.; Bin Peng, Xue; Rempe, Davis

doi:10.1007/978-3-031-73235-5_14

Generating Human Interaction Motions in Scenes with Text Control

Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe

ECCV 2024

doi:10.1007/978-3-031-73235-5_14 /eccv/2024/yi2024eccv-generating/

Abstract

We present , a text-controlled scene-aware motion generation method based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets. We then enhance this model with a scene-aware component, fine-tuned using data augmented with detailed scene information, including ground plane and object shapes. To facilitate training, we embed annotated navigation and interaction motions within scenes. The proposed method produces realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses. Extensive experiments demonstrate that our approach surpasses prior techniques in terms of the plausibility of human-scene interactions and the realism and variety of the generated motions. Code and data are available at .

PDF ECCV Semantic Scholar

Cite

Text

Yi et al. "Generating Human Interaction Motions in Scenes with Text Control." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73235-5_14

Markdown

[Yi et al. "Generating Human Interaction Motions in Scenes with Text Control." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yi2024eccv-generating/) doi:10.1007/978-3-031-73235-5_14

BibTeX

@inproceedings{yi2024eccv-generating,
  title     = {{Generating Human Interaction Motions in Scenes with Text Control}},
  author    = {Yi, Hongwei and Thies, Justus and Black, Michael J. and Bin Peng, Xue and Rempe, Davis},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73235-5_14},
  url       = {https://mlanthology.org/eccv/2024/yi2024eccv-generating/}
}