Pose2Room: Understanding 3D Scenes from Human Activities

Abstract

With wearable IMU sensors, one can estimate human poses from wearable devices without requiring visual input. In this work, we pose the question: Can we reason about object structure in real-world environments solely from human trajectory information? Crucially, we observe that human motion and interactions tend to give strong information about the objects in a scene -- for instance a person sitting indicates the likely presence of a chair or sofa. To this end, we propose P2R-Net to learn a probabilistic 3D model of the objects in a scene characterized by their class categories and oriented 3D bounding boxes, based on an input observed human trajectory in the environment. P2R-Net models the probability distribution of object class as well as a deep Gaussian mixture model for object boxes, enabling sampling of multiple, diverse, likely modes of object configurations from an observed human trajectory. In our experiments we show that P2R-Net can effectively learn multi-modal distributions of likely objects for human motions, and produce a variety of plausible object structures of the environment, even without any visual information. The results demonstrate that P2R-Net consistently outperforms the baselines on the PROX dataset and the VirtualHome platform.

Cite

Text

Nie et al. "Pose2Room: Understanding 3D Scenes from Human Activities." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19812-0_25

Markdown

[Nie et al. "Pose2Room: Understanding 3D Scenes from Human Activities." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/nie2022eccv-pose2room/) doi:10.1007/978-3-031-19812-0_25

BibTeX

@inproceedings{nie2022eccv-pose2room,
  title     = {{Pose2Room: Understanding 3D Scenes from Human Activities}},
  author    = {Nie, Yinyu and Dai, Angela and Han, Xiaoguang and Nießner, Matthias},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19812-0_25},
  url       = {https://mlanthology.org/eccv/2022/nie2022eccv-pose2room/}
}