Relative Loss Bounds for Temporal-Difference Learning

Abstract

. Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the prediction is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses grows logarithmic with the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantia...

Cite

Text

Forster and Warmuth. "Relative Loss Bounds for Temporal-Difference Learning." International Conference on Machine Learning, 2000. doi:10.1023/A:1021825927902

Markdown

[Forster and Warmuth. "Relative Loss Bounds for Temporal-Difference Learning." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/forster2000icml-relative/) doi:10.1023/A:1021825927902

BibTeX

@inproceedings{forster2000icml-relative,
  title     = {{Relative Loss Bounds for Temporal-Difference Learning}},
  author    = {Forster, Jürgen and Warmuth, Manfred K.},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {295-302},
  doi       = {10.1023/A:1021825927902},
  url       = {https://mlanthology.org/icml/2000/forster2000icml-relative/}
}