Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation

Abstract

Reinforcement learning algorithms can become unstable when combined with linear function approximation. Algorithms that minimize the mean-square Bellman error are guaranteed to converge, but often do so slowly or are computationally expensive. In this paper, we propose to improve the convergence speed of piecewise linear function approximation by tracking the dynamics of the value function with the Kalman filter using a random-walk model. We cast this as a general framework in which we implement the TD, Q-Learning and MAXQ algorithms for different domains, and report empirical results demonstrating improved learning speed over previous methods.

Cite

Text

Phua and Fitch. "Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation." International Conference on Machine Learning, 2007. doi:10.1145/1273496.1273591

Markdown

[Phua and Fitch. "Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation." International Conference on Machine Learning, 2007.](https://mlanthology.org/icml/2007/phua2007icml-tracking/) doi:10.1145/1273496.1273591

BibTeX

@inproceedings{phua2007icml-tracking,
  title     = {{Tracking Value Function Dynamics to Improve Reinforcement Learning with Piecewise Linear Function Approximation}},
  author    = {Phua, Chee Wee and Fitch, Robert},
  booktitle = {International Conference on Machine Learning},
  year      = {2007},
  pages     = {751-758},
  doi       = {10.1145/1273496.1273591},
  url       = {https://mlanthology.org/icml/2007/phua2007icml-tracking/}
}