BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Abstract

There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL's performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.

Cite

Text

Chen et al. "BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning." Neural Information Processing Systems, 2020.

Markdown

[Chen et al. "BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/chen2020neurips-bail/)

BibTeX

@inproceedings{chen2020neurips-bail,
  title     = {{BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning}},
  author    = {Chen, Xinyue and Zhou, Zijian and Wang, Zheng and Wang, Che and Wu, Yanqiu and Ross, Keith},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/chen2020neurips-bail/}
}