On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Abstract

In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets – across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.

Cite

Text

Hansen et al. "On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline." International Conference on Machine Learning, 2023.

Markdown

[Hansen et al. "On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/hansen2023icml-pretraining/)

BibTeX

@inproceedings{hansen2023icml-pretraining,
  title     = {{On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline}},
  author    = {Hansen, Nicklas and Yuan, Zhecheng and Ze, Yanjie and Mu, Tongzhou and Rajeswaran, Aravind and Su, Hao and Xu, Huazhe and Wang, Xiaolong},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {12511-12526},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/hansen2023icml-pretraining/}
}