Bayesian Q-Learning with Imperfect Expert Demonstrations
Abstract
Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expert information. We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. The algorithm is based on a Bayesian framework to model suboptimal expert actions and derives Q-values' update rules by maximizing the posterior probability. It weighs expert information by the uncertainty of learnt Q-values and avoids excessive reliance on expert data, gradually reducing the usage of uninformative expert data. Experimentally, we evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards. With the proposed methods, we can achieve better results than Deep Q-learning from Demonstrations (Hester et al., 2017) in most environments.
Cite
Text
Che et al. "Bayesian Q-Learning with Imperfect Expert Demonstrations." NeurIPS 2022 Workshops: DeepRL, 2022.Markdown
[Che et al. "Bayesian Q-Learning with Imperfect Expert Demonstrations." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/che2022neuripsw-bayesian/)BibTeX
@inproceedings{che2022neuripsw-bayesian,
title = {{Bayesian Q-Learning with Imperfect Expert Demonstrations}},
author = {Che, Fengdi and Zhu, Xiru and Precup, Doina and Meger, David and Dudek, Gregory},
booktitle = {NeurIPS 2022 Workshops: DeepRL},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/che2022neuripsw-bayesian/}
}