SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Abstract

Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.

Cite

Text

Yuan et al. "SORNet: Spatial Object-Centric Representations for Sequential Manipulation." Conference on Robot Learning, 2021.

Markdown

[Yuan et al. "SORNet: Spatial Object-Centric Representations for Sequential Manipulation." Conference on Robot Learning, 2021.](https://mlanthology.org/corl/2021/yuan2021corl-sornet/)

BibTeX

@inproceedings{yuan2021corl-sornet,
  title     = {{SORNet: Spatial Object-Centric Representations for Sequential Manipulation}},
  author    = {Yuan, Wentao and Paxton, Chris and Desingh, Karthik and Fox, Dieter},
  booktitle = {Conference on Robot Learning},
  year      = {2021},
  pages     = {148-157},
  volume    = {164},
  url       = {https://mlanthology.org/corl/2021/yuan2021corl-sornet/}
}