VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

Abstract

We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA/.

Cite

Text

Zhu et al. "VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors." Conference on Robot Learning, 2022.

Markdown

[Zhu et al. "VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors." Conference on Robot Learning, 2022.](https://mlanthology.org/corl/2022/zhu2022corl-viola/)

BibTeX

@inproceedings{zhu2022corl-viola,
  title     = {{VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors}},
  author    = {Zhu, Yifeng and Joshi, Abhishek and Stone, Peter and Zhu, Yuke},
  booktitle = {Conference on Robot Learning},
  year      = {2022},
  pages     = {1199-1210},
  volume    = {205},
  url       = {https://mlanthology.org/corl/2022/zhu2022corl-viola/}
}