Backstepping Temporal Difference Learning

Abstract

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD learning algorithms have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective. Our method relies on the backstepping technique, which is widely used in nonlinear control theory.

Cite

Text

Lim and Lee. "Backstepping Temporal Difference Learning." International Conference on Learning Representations, 2023.

Markdown

[Lim and Lee. "Backstepping Temporal Difference Learning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/lim2023iclr-backstepping/)

BibTeX

@inproceedings{lim2023iclr-backstepping,
  title     = {{Backstepping Temporal Difference Learning}},
  author    = {Lim, Han-Dong and Lee, Donghwan},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/lim2023iclr-backstepping/}
}