CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation
Abstract
Making contact with purpose is a central part of robot manipulation and remains essential for many household tasks – from sweeping dust into a dustpan, to wiping tables; from erasing whiteboards, to applying paint. In this work, we investigate learning language-conditioned, vision-based manipulation policies wherein the action representation is in fact, contact itself – predicting contact formations at which tools grasped by the robot should meet an observable surface. Our approach, Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation (CALAMARI), exhibits several advantages including (i) benefiting from existing visual-language models for pretrained spatial features, grounding instructions to behaviors, and for sim2real transfer; and (ii) factorizing perception and control over a natural boundary (i.e. contact) into two modules that synergize with each other, whereby action predictions can be aligned per pixel with image observations, and low-level controllers can optimize motion trajectories that maintain contact while avoiding penetration. Experiments show that CALAMARI outperforms existing state-of-the-art model architectures for a broad range of contact-rich tasks, and pushes new ground on embodiment-agnostic generalization to unseen objects with varying elasticity, geometry, and colors in both simulated and real-world settings.
Cite
Text
Wi et al. "CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation." Conference on Robot Learning, 2023.Markdown
[Wi et al. "CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/wi2023corl-calamari/)BibTeX
@inproceedings{wi2023corl-calamari,
title = {{CALAMARI: Contact-Aware and Language Conditioned Spatial Action MApping for Contact-RIch Manipulation}},
author = {Wi, Youngsun and Van der Merwe, Mark and Florence, Pete and Zeng, Andy and Fazeli, Nima},
booktitle = {Conference on Robot Learning},
year = {2023},
pages = {2753-2771},
volume = {229},
url = {https://mlanthology.org/corl/2023/wi2023corl-calamari/}
}