Learning Representation and Control in Continuous Markov Decision Processes
Abstract
This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nyström extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.
Cite
Text
Mahadevan et al. "Learning Representation and Control in Continuous Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2006.Markdown
[Mahadevan et al. "Learning Representation and Control in Continuous Markov Decision Processes." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/mahadevan2006aaai-learning/)BibTeX
@inproceedings{mahadevan2006aaai-learning,
title = {{Learning Representation and Control in Continuous Markov Decision Processes}},
author = {Mahadevan, Sridhar and Maggioni, Mauro and Ferguson, Kimberly and Osentoski, Sarah},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2006},
pages = {1194-1199},
url = {https://mlanthology.org/aaai/2006/mahadevan2006aaai-learning/}
}