Minimax PAC Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model

Abstract

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor γ ∈[0,1) only O ( N log( N / δ )/((1− γ )^3 ε ^2)) state-transition samples are required to find an ε -optimal estimation of the action-value function with the probability (w.p.) 1− δ . Further, we prove that, for small values of ε , an order of O ( N log( N / δ )/((1− γ )^3 ε ^2)) samples is required to find an ε -optimal policy w.p. 1− δ . We also prove a matching lower bound of Θ ( N log( N / δ )/((1− γ )^3 ε ^2)) on the sample complexity of estimating the optimal action-value function with ε accuracy. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: the upper bounds match the lower bound in terms of  N , ε , δ and 1/(1− γ ) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1− γ ).

Cite

Text

Azar et al. "Minimax PAC Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model." Machine Learning, 2013. doi:10.1007/S10994-013-5368-1

Markdown

[Azar et al. "Minimax PAC Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model." Machine Learning, 2013.](https://mlanthology.org/mlj/2013/azar2013mlj-minimax/) doi:10.1007/S10994-013-5368-1

BibTeX

@article{azar2013mlj-minimax,
  title     = {{Minimax PAC Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model}},
  author    = {Azar, Mohammad Gheshlaghi and Munos, Rémi and Kappen, Hilbert J.},
  journal   = {Machine Learning},
  year      = {2013},
  pages     = {325-349},
  doi       = {10.1007/S10994-013-5368-1},
  volume    = {91},
  url       = {https://mlanthology.org/mlj/2013/azar2013mlj-minimax/}
}