Online Pre-Training for Offline-to-Online Reinforcement Learning

Abstract

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

Cite

Text

Shin et al. "Online Pre-Training for Offline-to-Online Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Shin et al. "Online Pre-Training for Offline-to-Online Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/shin2025icml-online/)

BibTeX

@inproceedings{shin2025icml-online,
  title     = {{Online Pre-Training for Offline-to-Online Reinforcement Learning}},
  author    = {Shin, Yongjae and Kim, Jeonghye and Jung, Whiyoung and Hong, Sunghoon and Yoon, Deunsol and Jang, Youngsoo and Kim, Geon-Hyeong and Chae, Jongseong and Sung, Youngchul and Lee, Kanghoon and Lim, Woohyung},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {55122-55144},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/shin2025icml-online/}
}