Self-Imitation Learning via Generalized Lower Bound Q-Learning

Abstract

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the potential performance gains provided by self-imitation learning, we show that n-step lower bound Q-learning achieves a trade-off between fixed point bias and contraction rate, drawing close connections to the popular uncorrected n-step Q-learning. We finally show that n-step lower bound Q-learning is a more robust alternative to return-based self-imitation learning and uncorrected n-step, over a wide range of benchmark tasks.

Cite

Text

Tang. "Self-Imitation Learning via Generalized Lower Bound Q-Learning." Neural Information Processing Systems, 2020.

Markdown

[Tang. "Self-Imitation Learning via Generalized Lower Bound Q-Learning." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/tang2020neurips-selfimitation/)

BibTeX

@inproceedings{tang2020neurips-selfimitation,
  title     = {{Self-Imitation Learning via Generalized Lower Bound Q-Learning}},
  author    = {Tang, Yunhao},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/tang2020neurips-selfimitation/}
}