Ensemble-Based Offline Reinforcement Learning with Adaptive Behavior Cloning

Abstract

In this work, we build upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, and propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based actor critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL MuJoCo benchmarks.

Cite

Text

Wang and Zhang. "Ensemble-Based Offline Reinforcement Learning with Adaptive Behavior Cloning." NeurIPS 2024 Workshops: AFM, 2024.

Markdown

[Wang and Zhang. "Ensemble-Based Offline Reinforcement Learning with Adaptive Behavior Cloning." NeurIPS 2024 Workshops: AFM, 2024.](https://mlanthology.org/neuripsw/2024/wang2024neuripsw-ensemblebased/)

BibTeX

@inproceedings{wang2024neuripsw-ensemblebased,
  title     = {{Ensemble-Based Offline Reinforcement Learning with Adaptive Behavior Cloning}},
  author    = {Wang, Danyang and Zhang, Lingsong},
  booktitle = {NeurIPS 2024 Workshops: AFM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/wang2024neuripsw-ensemblebased/}
}