Learning Representation and Control in Markov Decision Processes: New Frontiers

Abstract

This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse . A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally on discrete and continuous MDPs. Some directions for future work are finally outlined.

Cite

Text

Mahadevan. "Learning Representation and Control in Markov Decision Processes: New Frontiers." Foundations and Trends in Machine Learning, 2009. doi:10.1561/2200000003

Markdown

[Mahadevan. "Learning Representation and Control in Markov Decision Processes: New Frontiers." Foundations and Trends in Machine Learning, 2009.](https://mlanthology.org/ftml/2009/mahadevan2009ftml-learning/) doi:10.1561/2200000003

BibTeX

@article{mahadevan2009ftml-learning,
  title     = {{Learning Representation and Control in Markov Decision Processes: New Frontiers}},
  author    = {Mahadevan, Sridhar},
  journal   = {Foundations and Trends in Machine Learning},
  year      = {2009},
  pages     = {403-565},
  doi       = {10.1561/2200000003},
  volume    = {1},
  url       = {https://mlanthology.org/ftml/2009/mahadevan2009ftml-learning/}
}