Adaptive Sampling for Best Policy Identification in Markov Decision Processes

ICML 2021 pp. 7459-7468

/icml/2021/marjani2021icml-adaptive/

Abstract

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as possible. We first derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optimal sample allocation that solves a non-convex program, and hence, is hard to exploit in the design of efficient algorithms. We then provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit. The upper bound depends on specific functionals of the MDP such as the sub-optimality gaps and the variance of the next-state value function, and thus really captures the hardness of the MDP. Finally, we devise KLB-TS (KL Ball Track-and-Stop), an algorithm tracking this nearly-optimal allocation, and provide asymptotic guarantees for its sample complexity (both almost surely and in expectation). The advantages of KLB-TS against state-of-the-art algorithms are discussed and illustrated numerically.

PDF ICML Semantic Scholar

Cite

Text

Al Marjani and Proutiere. "Adaptive Sampling for Best Policy Identification in Markov Decision Processes." International Conference on Machine Learning, 2021.

Markdown

[Al Marjani and Proutiere. "Adaptive Sampling for Best Policy Identification in Markov Decision Processes." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/marjani2021icml-adaptive/)

BibTeX

@inproceedings{marjani2021icml-adaptive,
  title     = {{Adaptive Sampling for Best Policy Identification in Markov Decision Processes}},
  author    = {Al Marjani, Aymen and Proutiere, Alexandre},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {7459-7468},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/marjani2021icml-adaptive/}
}