Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Abstract

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as possible. We first derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optimal sample allocation that solves a non-convex program, and hence, is hard to exploit in the design of efficient algorithms. We then provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit. The upper bound depends on specific functionals of the MDP such as the sub-optimality gaps and the variance of the next-state value function, and thus really captures the hardness of the MDP. Finally, we devise KLB-TS (KL Ball Track-and-Stop), an algorithm tracking this nearly-optimal allocation, and provide asymptotic guarantees for its sample complexity (both almost surely and in expectation). The advantages of KLB-TS against state-of-the-art algorithms are discussed and illustrated numerically.

Cite

Text

Al Marjani and Proutiere. "Adaptive Sampling for Best Policy Identification in Markov Decision Processes." International Conference on Machine Learning, 2021.

Markdown

[Al Marjani and Proutiere. "Adaptive Sampling for Best Policy Identification in Markov Decision Processes." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/marjani2021icml-adaptive/)

BibTeX

@inproceedings{marjani2021icml-adaptive,
  title     = {{Adaptive Sampling for Best Policy Identification in Markov Decision Processes}},
  author    = {Al Marjani, Aymen and Proutiere, Alexandre},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {7459-7468},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/marjani2021icml-adaptive/}
}