Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning

Abstract

Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonetheless, recent studies have shown that non-conservative algorithms empirically outperform conservative ones. Motivated by the empirical results and the lack of theory, we carry out theoretical analyses of Peng’s Q($\lambda$), a representative example of non-conservative algorithms. We prove that \emph{it also converges to an optimal policy} provided that the behavior policy slowly tracks a greedy policy in a way similar to conservative policy iteration. Such a result has been conjectured to be true but has not been proven. We also experiment with Peng’s Q($\lambda$) in complex continuous control tasks, confirming that Peng’s Q($\lambda$) often outperforms conservative algorithms despite its simplicity. These results indicate that Peng’s Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Cite

Text

Kozuno et al. "Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning." International Conference on Machine Learning, 2021.

Markdown

[Kozuno et al. "Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/kozuno2021icml-revisiting/)

BibTeX

@inproceedings{kozuno2021icml-revisiting,
  title     = {{Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning}},
  author    = {Kozuno, Tadashi and Tang, Yunhao and Rowland, Mark and Munos, Remi and Kapturowski, Steven and Dabney, Will and Valko, Michal and Abel, David},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {5794-5804},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/kozuno2021icml-revisiting/}
}