Imitating Task and Motion Planning with Visuomotor Transformers

Abstract

Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. We conduct a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving $70$ to $80%$ success rates. Video results and code at https://mihdalal.github.io/optimus/

Cite

Text

Dalal et al. "Imitating Task and Motion Planning with Visuomotor Transformers." Conference on Robot Learning, 2023.

Markdown

[Dalal et al. "Imitating Task and Motion Planning with Visuomotor Transformers." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/dalal2023corl-imitating/)

BibTeX

@inproceedings{dalal2023corl-imitating,
  title     = {{Imitating Task and Motion Planning with Visuomotor Transformers}},
  author    = {Dalal, Murtaza and Mandlekar, Ajay and Garrett, Caelan Reed and Handa, Ankur and Salakhutdinov, Ruslan and Fox, Dieter},
  booktitle = {Conference on Robot Learning},
  year      = {2023},
  pages     = {2565-2593},
  volume    = {229},
  url       = {https://mlanthology.org/corl/2023/dalal2023corl-imitating/}
}