Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

NeurIPS 1998 pp. 996-1002

/neurips/1998/kearns1998neurips-finitesample/

Abstract

In this paper, we address two issues of long-standing interest in the re(cid:173) inforcement learning literature. First, what kinds of performance guar(cid:173) antees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the num(cid:173) ber of state transitions observed. In particular, on the order of only (Nlog(1/c)/c2 )(log(N) + loglog(l/c)) transitions are sufficient for both algorithms to come within c of the optimal policy, in an idealized model that assumes the observed transitions are "well-mixed" throughout an N-state MDP. Thus, the two approaches have roughly the same sample complexity. Perhaps surprisingly, this sample complexity is far less than what is required for the model-based approach to actually construct a good approximation to the next-state distribution. The result also shows that the amount of memory required by the model-based approach is closer to N than to N 2 • For either approach, to remove the assumption that the observed tran(cid:173) sitions are well-mixed, we consider a model in which the transitions are determined by a fixed, arbitrary exploration policy. Bounds on the number of transitions required in order to achieve a desired level of performance are then related to the stationary distribution and mixing time of this policy.

PDF NeurIPS Semantic Scholar

Cite

Text

Kearns and Singh. "Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms." Neural Information Processing Systems, 1998.

Markdown

[Kearns and Singh. "Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/kearns1998neurips-finitesample/)

BibTeX

@inproceedings{kearns1998neurips-finitesample,
  title     = {{Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms}},
  author    = {Kearns, Michael J. and Singh, Satinder P.},
  booktitle = {Neural Information Processing Systems},
  year      = {1998},
  pages     = {996-1002},
  url       = {https://mlanthology.org/neurips/1998/kearns1998neurips-finitesample/}
}