Efficient Online Reinforcement Learning with Offline Data

Abstract

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.

Cite

Text

Ball et al. "Efficient Online Reinforcement Learning with Offline Data." International Conference on Machine Learning, 2023.

Markdown

[Ball et al. "Efficient Online Reinforcement Learning with Offline Data." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/ball2023icml-efficient/)

BibTeX

@inproceedings{ball2023icml-efficient,
  title     = {{Efficient Online Reinforcement Learning with Offline Data}},
  author    = {Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {1577-1594},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/ball2023icml-efficient/}
}