Pre-Training with Synthetic Data Helps Offline Reinforcement Learning

Abstract

Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.

Cite

Text

Wang et al. "Pre-Training with Synthetic Data Helps Offline Reinforcement Learning." International Conference on Learning Representations, 2024.

Markdown

[Wang et al. "Pre-Training with Synthetic Data Helps Offline Reinforcement Learning." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/wang2024iclr-pretraining/)

BibTeX

@inproceedings{wang2024iclr-pretraining,
  title     = {{Pre-Training with Synthetic Data Helps Offline Reinforcement Learning}},
  author    = {Wang, Zecheng and Wang, Che and Dong, Zixuan and Ross, Keith W.},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/wang2024iclr-pretraining/}
}