A Generalization Error for Q-Learning

Abstract

Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.

Cite

Text

Murphy. "A Generalization Error for Q-Learning." Journal of Machine Learning Research, 2005.

Markdown

[Murphy. "A Generalization Error for Q-Learning." Journal of Machine Learning Research, 2005.](https://mlanthology.org/jmlr/2005/murphy2005jmlr-generalization/)

BibTeX

@article{murphy2005jmlr-generalization,
  title     = {{A Generalization Error for Q-Learning}},
  author    = {Murphy, Susan A.},
  journal   = {Journal of Machine Learning Research},
  year      = {2005},
  pages     = {1073-1097},
  volume    = {6},
  url       = {https://mlanthology.org/jmlr/2005/murphy2005jmlr-generalization/}
}