RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Abstract
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100–1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent’s capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.
Cite
Text
Bousmalis et al. "RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation." Transactions on Machine Learning Research, 2024.Markdown
[Bousmalis et al. "RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/bousmalis2024tmlr-robocat/)BibTeX
@article{bousmalis2024tmlr-robocat,
title = {{RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation}},
author = {Bousmalis, Konstantinos and Vezzani, Giulia and Rao, Dushyant and Devin, Coline Manon and Lee, Alex X. and Villalonga, Maria Bauza and Davchev, Todor and Zhou, Yuxiang and Gupta, Agrim and Raju, Akhil and Laurens, Antoine and Fantacci, Claudio and Dalibard, Valentin and Zambelli, Martina and Martins, Murilo Fernandes and Pevceviciute, Rugile and Blokzijl, Michiel and Denil, Misha and Batchelor, Nathan and Lampe, Thomas and Parisotto, Emilio and Zolna, Konrad and Reed, Scott and Colmenarejo, Sergio Gómez and Scholz, Jonathan and Abdolmaleki, Abbas and Groth, Oliver and Regli, Jean-Baptiste and Sushkov, Oleg and Rothörl, Thomas and Chen, Jose Enrique and Aytar, Yusuf and Barker, David and Ortiz, Joy and Riedmiller, Martin and Springenberg, Jost Tobias and Hadsell, Raia and Nori, Francesco and Heess, Nicolas},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/bousmalis2024tmlr-robocat/}
}