PAC Model-Free Reinforcement Learning

Abstract

For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.

Cite

Text

Strehl et al. "PAC Model-Free Reinforcement Learning." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143955

Markdown

[Strehl et al. "PAC Model-Free Reinforcement Learning." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/strehl2006icml-pac/) doi:10.1145/1143844.1143955

BibTeX

@inproceedings{strehl2006icml-pac,
  title     = {{PAC Model-Free Reinforcement Learning}},
  author    = {Strehl, Alexander L. and Li, Lihong and Wiewiora, Eric and Langford, John and Littman, Michael L.},
  booktitle = {International Conference on Machine Learning},
  year      = {2006},
  pages     = {881-888},
  doi       = {10.1145/1143844.1143955},
  url       = {https://mlanthology.org/icml/2006/strehl2006icml-pac/}
}