In-Context Imitation Learning via Next-Token Prediction

Abstract

We explore how to enable in-context learning capabilities of next-token prediction models for robotics, allowing the model to perform novel tasks by prompting it with human teleop demonstration examples without fine-tuning. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories, which include images, proprioceptive states, and actions. This approach allows flexible and training-free execution of new tasks at test time, achieved by prompting the model with demonstration trajectories of the new task. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompts and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art robot foundation models on generalization to unseen tasks.

Cite

Text

Fu et al. "In-Context Imitation Learning via Next-Token Prediction." NeurIPS 2024 Workshops: OWA, 2024.

Markdown

[Fu et al. "In-Context Imitation Learning via Next-Token Prediction." NeurIPS 2024 Workshops: OWA, 2024.](https://mlanthology.org/neuripsw/2024/fu2024neuripsw-incontext/)

BibTeX

@inproceedings{fu2024neuripsw-incontext,
  title     = {{In-Context Imitation Learning via Next-Token Prediction}},
  author    = {Fu, Letian and Huang, Huang and Datta, Gaurav and Chen, Lawrence Yunliang and Panitch, William Chung-Ho and Liu, Fangchen and Li, Hui and Goldberg, Ken},
  booktitle = {NeurIPS 2024 Workshops: OWA},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/fu2024neuripsw-incontext/}
}