Linear Least-Squares Algorithms for Temporal Difference Learning

Abstract

We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Square TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton‘s TD(λ) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, σTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on σTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.

Cite

Text

Bradtke and Barto. "Linear Least-Squares Algorithms for Temporal Difference Learning." Machine Learning, 1996. doi:10.1023/A:1018056104778

Markdown

[Bradtke and Barto. "Linear Least-Squares Algorithms for Temporal Difference Learning." Machine Learning, 1996.](https://mlanthology.org/mlj/1996/bradtke1996mlj-linear/) doi:10.1023/A:1018056104778

BibTeX

@article{bradtke1996mlj-linear,
  title     = {{Linear Least-Squares Algorithms for Temporal Difference Learning}},
  author    = {Bradtke, Steven J. and Barto, Andrew G.},
  journal   = {Machine Learning},
  year      = {1996},
  pages     = {33-57},
  doi       = {10.1023/A:1018056104778},
  volume    = {22},
  url       = {https://mlanthology.org/mlj/1996/bradtke1996mlj-linear/}
}