Visually-Grounded Library of Behaviors for Manipulating Diverse Objects Across Diverse Configurations and Views

Abstract

We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector’s input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations.

Cite

Text

Yang et al. "Visually-Grounded Library of Behaviors for Manipulating Diverse Objects Across Diverse Configurations and Views." Conference on Robot Learning, 2021.

Markdown

[Yang et al. "Visually-Grounded Library of Behaviors for Manipulating Diverse Objects Across Diverse Configurations and Views." Conference on Robot Learning, 2021.](https://mlanthology.org/corl/2021/yang2021corl-visuallygrounded/)

BibTeX

@inproceedings{yang2021corl-visuallygrounded,
  title     = {{Visually-Grounded Library of Behaviors for Manipulating Diverse Objects Across Diverse Configurations and Views}},
  author    = {Yang, Jingyun and Tung, Hsiao-Yu and Zhang, Yunchu and Pathak, Gaurav and Pokle, Ashwini and Atkeson, Christopher G and Fragkiadaki, Katerina},
  booktitle = {Conference on Robot Learning},
  year      = {2021},
  pages     = {695-705},
  volume    = {164},
  url       = {https://mlanthology.org/corl/2021/yang2021corl-visuallygrounded/}
}