Learning Representation and Control in Markov Decision Processes: New Frontiers
Abstract
This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse . A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally on discrete and continuous MDPs. Some directions for future work are finally outlined.
Cite
Text
Mahadevan. "Learning Representation and Control in Markov Decision Processes: New Frontiers." Foundations and Trends in Machine Learning, 2009. doi:10.1561/2200000003Markdown
[Mahadevan. "Learning Representation and Control in Markov Decision Processes: New Frontiers." Foundations and Trends in Machine Learning, 2009.](https://mlanthology.org/ftml/2009/mahadevan2009ftml-learning/) doi:10.1561/2200000003BibTeX
@article{mahadevan2009ftml-learning,
title = {{Learning Representation and Control in Markov Decision Processes: New Frontiers}},
author = {Mahadevan, Sridhar},
journal = {Foundations and Trends in Machine Learning},
year = {2009},
pages = {403-565},
doi = {10.1561/2200000003},
volume = {1},
url = {https://mlanthology.org/ftml/2009/mahadevan2009ftml-learning/}
}