Uncertainty-Driven Pessimistic Q-Ensemble for Offline-to-Online Reinforcement Learning

Abstract

Re-using existing offline reinforcement learning (RL) agents is an emerging topic for reducing the dominant computational cost for exploration in many settings. To effectively fine-tune the pre-trained offline policies, both offline samples and online interactions may be leveraged. In this paper, we propose the idea of incorporating a pessimistic Q-ensemble and an uncertainty quantification technique to effectively fine-tune offline agents. To stabilize online Q-function estimates during fine-tuning, the proposed method uses uncertainty estimation as a penalization for a replay buffer with a mixture of online interactions from the ensemble agent and offline samples from the behavioral policies. In various robotic tasks on D4RL benchmark, we show that our method outperforms the state-of-the-art algorithms in terms of the average return and the sample efficiency.

Cite

Text

Jang and Kim. "Uncertainty-Driven Pessimistic Q-Ensemble for Offline-to-Online Reinforcement Learning." NeurIPS 2022 Workshops: Offline_RL, 2022.

Markdown

[Jang and Kim. "Uncertainty-Driven Pessimistic Q-Ensemble for Offline-to-Online Reinforcement Learning." NeurIPS 2022 Workshops: Offline_RL, 2022.](https://mlanthology.org/neuripsw/2022/jang2022neuripsw-uncertaintydriven/)

BibTeX

@inproceedings{jang2022neuripsw-uncertaintydriven,
  title     = {{Uncertainty-Driven Pessimistic Q-Ensemble for Offline-to-Online Reinforcement Learning}},
  author    = {Jang, Ingook and Kim, Seonghyun},
  booktitle = {NeurIPS 2022 Workshops: Offline_RL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/jang2022neuripsw-uncertaintydriven/}
}