Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration

Abstract

We introduce a modular, neuro-symbolic framework for teaching robots new skills through language and visual demonstration. Our approach, ShowTell, composes a mixture of foundation models to synthesize robot manipulation programs that are easy to interpret and generalize across a wide range of tasks and environments. ShowTell is designed to handle complex demonstrations involving high level logic such as loops and conditionals while being intuitive and natural for end-users. We validate this approach through a series of real-world robot experiments, showing that ShowTell out-performs a state-of-the-art baseline based on GPT4-V, on a variety of tasks, and that it is able to generalize to unseen environments and within category objects.

Cite

Text

Murray et al. "Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration." Proceedings of The 8th Conference on Robot Learning, 2024.

Markdown

[Murray et al. "Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration." Proceedings of The 8th Conference on Robot Learning, 2024.](https://mlanthology.org/corl/2024/murray2024corl-teaching/)

BibTeX

@inproceedings{murray2024corl-teaching,
  title     = {{Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration}},
  author    = {Murray, Michael and Gupta, Abhishek and Cakmak, Maya},
  booktitle = {Proceedings of The 8th Conference on Robot Learning},
  year      = {2024},
  pages     = {4033-4050},
  volume    = {270},
  url       = {https://mlanthology.org/corl/2024/murray2024corl-teaching/}
}