Learning Representation and Control in Markov Decision Processes: New Frontiers

Mahadevan, Sridhar

doi:10.1561/2200000003

Learning Representation and Control in Markov Decision Processes: New Frontiers

Sridhar Mahadevan

FTML 2009 pp. 403-565

doi:10.1561/2200000003 /ftml/2009/mahadevan2009ftml-learning/

Abstract

This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse . A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally on discrete and continuous MDPs. Some directions for future work are finally outlined.

PDF FTML Semantic Scholar

Cite

Text

Mahadevan. "Learning Representation and Control in Markov Decision Processes: New Frontiers." Foundations and Trends in Machine Learning, 2009. doi:10.1561/2200000003

Markdown

[Mahadevan. "Learning Representation and Control in Markov Decision Processes: New Frontiers." Foundations and Trends in Machine Learning, 2009.](https://mlanthology.org/ftml/2009/mahadevan2009ftml-learning/) doi:10.1561/2200000003

BibTeX

@article{mahadevan2009ftml-learning,
  title     = {{Learning Representation and Control in Markov Decision Processes: New Frontiers}},
  author    = {Mahadevan, Sridhar},
  journal   = {Foundations and Trends in Machine Learning},
  year      = {2009},
  pages     = {403-565},
  doi       = {10.1561/2200000003},
  volume    = {1},
  url       = {https://mlanthology.org/ftml/2009/mahadevan2009ftml-learning/}
}