Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

AAAI 2020 pp. 3741-3748

doi:10.1609/AAAI.V34I04.5784 /aaai/2020/asis2020aaai-fixed/

Abstract

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a fixed number of future time steps. To learn the value function for horizon h, these algorithms bootstrap from the value function for horizon h−1, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as “the deadly triad”). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and n-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad.

PDF AAAI Semantic Scholar

Cite

Text

De Asis et al. "Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I04.5784

Markdown

[De Asis et al. "Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/asis2020aaai-fixed/) doi:10.1609/AAAI.V34I04.5784

BibTeX

@inproceedings{asis2020aaai-fixed,
  title     = {{Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning}},
  author    = {De Asis, Kristopher and Chan, Alan and Pitis, Silviu and Sutton, Richard S. and Graves, Daniel},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {3741-3748},
  doi       = {10.1609/AAAI.V34I04.5784},
  url       = {https://mlanthology.org/aaai/2020/asis2020aaai-fixed/}
}