Planning in Entropy-Regularized Markov Decision Processes and Games

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Cite

Text

Grill et al. "Planning in Entropy-Regularized Markov Decision Processes and Games." Neural Information Processing Systems, 2019.

Markdown

[Grill et al. "Planning in Entropy-Regularized Markov Decision Processes and Games." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/grill2019neurips-planning/)

BibTeX

@inproceedings{grill2019neurips-planning,
  title     = {{Planning in Entropy-Regularized Markov Decision Processes and Games}},
  author    = {Grill, Jean-Bastien and Domingues, Omar Darwiche and Menard, Pierre and Munos, Remi and Valko, Michal},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {12404-12413},
  url       = {https://mlanthology.org/neurips/2019/grill2019neurips-planning/}
}