Learning Representation and Control in Continuous Markov Decision Processes

Abstract

This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nyström extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.

Cite

Text

Mahadevan et al. "Learning Representation and Control in Continuous Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Mahadevan et al. "Learning Representation and Control in Continuous Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/mahadevan2006aaai-learning/)

BibTeX

@inproceedings{mahadevan2006aaai-learning,
  title     = {{Learning Representation and Control in Continuous Markov Decision Processes}},
  author    = {Mahadevan, Sridhar and Maggioni, Mauro and Ferguson, Kimberly and Osentoski, Sarah},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1194-1199},
  url       = {https://mlanthology.org/aaai/2006/mahadevan2006aaai-learning/}
}