Reinforcement Learning from Imperfect Demonstrations Under Soft Expert Guidance

Abstract

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

Cite

Text

Jing et al. "Reinforcement Learning from Imperfect Demonstrations Under Soft Expert Guidance." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I04.5953

Markdown

[Jing et al. "Reinforcement Learning from Imperfect Demonstrations Under Soft Expert Guidance." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/jing2020aaai-reinforcement/) doi:10.1609/AAAI.V34I04.5953

BibTeX

@inproceedings{jing2020aaai-reinforcement,
  title     = {{Reinforcement Learning from Imperfect Demonstrations Under Soft Expert Guidance}},
  author    = {Jing, Mingxuan and Ma, Xiaojian and Huang, Wenbing and Sun, Fuchun and Yang, Chao and Fang, Bin and Liu, Huaping},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {5109-5116},
  doi       = {10.1609/AAAI.V34I04.5953},
  url       = {https://mlanthology.org/aaai/2020/jing2020aaai-reinforcement/}
}