Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning

Abstract

We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.

Cite

Text

Keller et al. "Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143901

Markdown

[Keller et al. "Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/keller2006icml-automatic/) doi:10.1145/1143844.1143901

BibTeX

@inproceedings{keller2006icml-automatic,
  title     = {{Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning}},
  author    = {Keller, Philipp W. and Mannor, Shie and Precup, Doina},
  booktitle = {International Conference on Machine Learning},
  year      = {2006},
  pages     = {449-456},
  doi       = {10.1145/1143844.1143901},
  url       = {https://mlanthology.org/icml/2006/keller2006icml-automatic/}
}