PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining
Abstract
A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos – a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.
Cite
Text
Thomas et al. "PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining." Conference on Robot Learning, 2023.Markdown
[Thomas et al. "PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/thomas2023corl-plex/)BibTeX
@inproceedings{thomas2023corl-plex,
title = {{PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining}},
author = {Thomas, Garrett and Cheng, Ching-An and Loynd, Ricky and Frujeri, Felipe Vieira and Vineet, Vibhav and Jalobeanu, Mihai and Kolobov, Andrey},
booktitle = {Conference on Robot Learning},
year = {2023},
pages = {2624-2641},
volume = {229},
url = {https://mlanthology.org/corl/2023/thomas2023corl-plex/}
}