Learning 3D Dynamic Scene Representations for Robot Manipulation

Abstract

3D scene representation for robot manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Code and data are available at dsr-net.cs.columbia.edu.

Cite

Text

Xu et al. "Learning 3D Dynamic Scene Representations for Robot Manipulation." Conference on Robot Learning, 2020.

Markdown

[Xu et al. "Learning 3D Dynamic Scene Representations for Robot Manipulation." Conference on Robot Learning, 2020.](https://mlanthology.org/corl/2020/xu2020corl-learning/)

BibTeX

@inproceedings{xu2020corl-learning,
  title     = {{Learning 3D Dynamic Scene Representations for Robot Manipulation}},
  author    = {Xu, Zhenjia and He, Zhanpeng and Wu, Jiajun and Song, Shuran},
  booktitle = {Conference on Robot Learning},
  year      = {2020},
  pages     = {126-142},
  volume    = {155},
  url       = {https://mlanthology.org/corl/2020/xu2020corl-learning/}
}