Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Abstract

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

Cite

Text

Hu et al. "Bayesian Design Principles for Offline-to-Online Reinforcement Learning." International Conference on Machine Learning, 2024.

Markdown

[Hu et al. "Bayesian Design Principles for Offline-to-Online Reinforcement Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/hu2024icml-bayesian/)

BibTeX

@inproceedings{hu2024icml-bayesian,
  title     = {{Bayesian Design Principles for Offline-to-Online Reinforcement Learning}},
  author    = {Hu, Hao and Yang, Yiqin and Ye, Jianing and Wu, Chengjie and Mai, Ziqing and Hu, Yujing and Lv, Tangjie and Fan, Changjie and Zhao, Qianchuan and Zhang, Chongjie},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {19491-19515},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/hu2024icml-bayesian/}
}