On the Sample Complexity of Reinforcement Learning with a Generative Model

Mohammad Gheshlaghi Azar, Rémi Munos, Bert Kappen

ICML 2012

/icml/2012/azar2012icml-sample/

Abstract

We consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O(N log(N/δ)/(1 - γ)3e2)) samples are required to find an e-optimal estimation of the action-value function with the probability 1 - δ. We also prove a matching lower bound of Θ(N log(N/δ)/((1 - γ)3e2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower bound of RL in terms of N, e, δ and 1/(1-γ). Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1 - γ).

ICML Semantic Scholar

Cite

Text

Azar et al. "On the Sample Complexity of Reinforcement Learning with a Generative Model." International Conference on Machine Learning, 2012.

Markdown

[Azar et al. "On the Sample Complexity of Reinforcement Learning with a Generative Model." International Conference on Machine Learning, 2012.](https://mlanthology.org/icml/2012/azar2012icml-sample/)

BibTeX

@inproceedings{azar2012icml-sample,
  title     = {{On the Sample Complexity of Reinforcement Learning with a Generative Model}},
  author    = {Azar, Mohammad Gheshlaghi and Munos, Rémi and Kappen, Bert},
  booktitle = {International Conference on Machine Learning},
  year      = {2012},
  url       = {https://mlanthology.org/icml/2012/azar2012icml-sample/}
}