A Worst-Case Comparison Between Temporal Difference and Residual Gradient with Linear Function Approximation

Abstract

Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them except in very limited cases. This paper employs techniques from online learning of linear functions and provides a worst-case analysis to compare these two types of algorithms when linear function approximation is used. No statistical assumptions are made on the sequence of observations. In particular, our results suggest that RG may result in smaller temporal differences, while TD(0) is more likely to yield smaller prediction errors. These phenomena can be observed even in two simple Markov chain examples.

Cite

Text

Li. "A Worst-Case Comparison Between Temporal Difference and Residual Gradient with Linear Function Approximation." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390227

Markdown

[Li. "A Worst-Case Comparison Between Temporal Difference and Residual Gradient with Linear Function Approximation." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/li2008icml-worst/) doi:10.1145/1390156.1390227

BibTeX

@inproceedings{li2008icml-worst,
  title     = {{A Worst-Case Comparison Between Temporal Difference and Residual Gradient with Linear Function Approximation}},
  author    = {Li, Lihong},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {560-567},
  doi       = {10.1145/1390156.1390227},
  url       = {https://mlanthology.org/icml/2008/li2008icml-worst/}
}