MIRA: Mental Imagery for Robotic Affordances

Abstract

Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.

Cite

Text

Lin et al. "MIRA: Mental Imagery for Robotic Affordances." Conference on Robot Learning, 2022.

Markdown

[Lin et al. "MIRA: Mental Imagery for Robotic Affordances." Conference on Robot Learning, 2022.](https://mlanthology.org/corl/2022/lin2022corl-mira/)

BibTeX

@inproceedings{lin2022corl-mira,
  title     = {{MIRA: Mental Imagery for Robotic Affordances}},
  author    = {Lin, Yen-Chen and Florence, Pete and Zeng, Andy and Barron, Jonathan T. and Du, Yilun and Ma, Wei-Chiu and Simeonov, Anthony and Garcia, Alberto Rodriguez and Isola, Phillip},
  booktitle = {Conference on Robot Learning},
  year      = {2022},
  pages     = {1916-1927},
  volume    = {205},
  url       = {https://mlanthology.org/corl/2022/lin2022corl-mira/}
}