Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Abstract

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work (Efroni et al., 2018), multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.

Cite

Text

Efroni et al. "Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning." Neural Information Processing Systems, 2018.

Markdown

[Efroni et al. "Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/efroni2018neurips-multiplestep/)

BibTeX

@inproceedings{efroni2018neurips-multiplestep,
  title     = {{Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning}},
  author    = {Efroni, Yonathan and Dalal, Gal and Scherrer, Bruno and Mannor, Shie},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {5238-5247},
  url       = {https://mlanthology.org/neurips/2018/efroni2018neurips-multiplestep/}
}