Planning in Entropy-Regularized Markov Decision Processes and Games
Abstract
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Cite
Text
Grill et al. "Planning in Entropy-Regularized Markov Decision Processes and Games." Neural Information Processing Systems, 2019.Markdown
[Grill et al. "Planning in Entropy-Regularized Markov Decision Processes and Games." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/grill2019neurips-planning/)BibTeX
@inproceedings{grill2019neurips-planning,
title = {{Planning in Entropy-Regularized Markov Decision Processes and Games}},
author = {Grill, Jean-Bastien and Domingues, Omar Darwiche and Menard, Pierre and Munos, Remi and Valko, Michal},
booktitle = {Neural Information Processing Systems},
year = {2019},
pages = {12404-12413},
url = {https://mlanthology.org/neurips/2019/grill2019neurips-planning/}
}