Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

Abstract

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of ``bootstrapped'' return estimates to make efficient use of sampled data. In particular, TD(lambda) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(lambda) chooses a very specific way of averaging these estimators based on the fixed parameter lambda, which may not lead to optimal convergence rates in all settings. In this paper, we derive an automated Bayesian approach to setting lambda that we call temporal difference Bayesian model averaging (TD-BMA). Empirically, TD-BMA always performs as well and often much better than the best fixed lambda for TD(lambda) (even when performance for different values of lambda varies across problems) without requiring that lambda or any analogous parameter be manually tuned.

Cite

Text

Downey and Sanner. "Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda." International Conference on Machine Learning, 2010.

Markdown

[Downey and Sanner. "Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/downey2010icml-temporal/)

BibTeX

@inproceedings{downey2010icml-temporal,
  title     = {{Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda}},
  author    = {Downey, Carlton and Sanner, Scott},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {311-318},
  url       = {https://mlanthology.org/icml/2010/downey2010icml-temporal/}
}