M2T2: Multi-Task Masked Transformer for Object-Centric Pick and Place

Abstract

With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing task-specific models excel in low-level manipulation of unknown objects, but only work for a single type of action. To bridge this gap, we present M2T2, a single model that supplies different types of low-level actions that work robustly on arbitrary objects in cluttered scenes. M2T2 is a transformer model which reasons about contact points and predicts valid gripper poses for different action modes given a raw point cloud of the scene. Trained on a large-scale synthetic dataset with 128K scenes, M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state-of-the-art task-specific models by about $19%$ in overall performance and $37.5%$ in challenging scenes were the object needs to be re-oriented for collision-free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench. Videos of robot experiments on unseen objects in both real world and simulation are available at m2-t2.github.io.

Cite

Text

Yuan et al. "M2T2: Multi-Task Masked Transformer for Object-Centric Pick and Place." Conference on Robot Learning, 2023.

Markdown

[Yuan et al. "M2T2: Multi-Task Masked Transformer for Object-Centric Pick and Place." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/yuan2023corl-m2t2/)

BibTeX

@inproceedings{yuan2023corl-m2t2,
  title     = {{M2T2: Multi-Task Masked Transformer for Object-Centric Pick and Place}},
  author    = {Yuan, Wentao and Murali, Adithyavairavan and Mousavian, Arsalan and Fox, Dieter},
  booktitle = {Conference on Robot Learning},
  year      = {2023},
  pages     = {3619-3630},
  volume    = {229},
  url       = {https://mlanthology.org/corl/2023/yuan2023corl-m2t2/}
}