Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Abstract

We investigate the sample efficiency of reinforcement learning in a $\gamma$-discounted infinite-horizon Markov decision process (MDP) with state space S and action space A, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $ |S| |A| / (1-\gamma)^2 $ (up to some log factor). The current paper overcomes this barrier by certifying the minimax optimality of model-based reinforcement learning as soon as the sample size exceeds the order of $ |S| |A| / (1-\gamma) $ (modulo some log factor). More specifically, a perturbed model-based planning algorithm provably finds an $\epsilon$-optimal policy with an order of $ |S| |A| / ((1-\gamma)^3\epsilon^2 ) $ samples (up to log factor) for any $0< \epsilon < 1/(1-\gamma)$. Along the way, we derive improved (instance-dependent) guarantees for model-based policy evaluation. To the best of our knowledge, this work provides the first minimax-optimal guarantee in a generative model that accommodates the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically impossible).

Cite

Text

Li et al. "Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model." Neural Information Processing Systems, 2020.

Markdown

[Li et al. "Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/li2020neurips-breaking/)

BibTeX

@inproceedings{li2020neurips-breaking,
  title     = {{Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model}},
  author    = {Li, Gen and Wei, Yuting and Chi, Yuejie and Gu, Yuantao and Chen, Yuxin},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/li2020neurips-breaking/}
}