CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation

Abstract

Making contact with purpose is a central part of robot manipulation and remains essential for many household tasks – from sweeping dust into a dustpan, to wiping tables; from erasing whiteboards, to applying paint. In this work, we investigate learning language-conditioned, vision-based manipulation policies wherein the action representation is in fact, contact itself – predicting contact formations at which tools grasped by the robot should meet an observable surface. Our approach, Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation (CALAMARI), exhibits several advantages including (i) benefiting from existing visual-language models for pretrained spatial features, grounding instructions to behaviors, and for sim2real transfer; and (ii) factorizing perception and control over a natural boundary (i.e. contact) into two modules that synergize with each other, whereby action predictions can be aligned per pixel with image observations, and low-level controllers can optimize motion trajectories that maintain contact while avoiding penetration. Experiments show that CALAMARI outperforms existing state-of-the-art model architectures for a broad range of contact-rich tasks, and pushes new ground on embodiment-agnostic generalization to unseen objects with varying elasticity, geometry, and colors in both simulated and real-world settings.

Cite

Text

Wi et al. "CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation." Conference on Robot Learning, 2023.

Markdown

[Wi et al. "CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/wi2023corl-calamari/)

BibTeX

@inproceedings{wi2023corl-calamari,
  title     = {{CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation}},
  author    = {Wi, Youngsun and Van der Merwe, Mark and Florence, Pete and Zeng, Andy and Fazeli, Nima},
  booktitle = {Conference on Robot Learning},
  year      = {2023},
  pages     = {2753-2771},
  volume    = {229},
  url       = {https://mlanthology.org/corl/2023/wi2023corl-calamari/}
}