Learning First-Order Markov Models for Control
Abstract
First-order Markov models have been successfully applied to many prob- lems, for example in modeling sequential data using Markov chains, and modeling control problems using the Markov decision processes (MDP) formalism. If a first-order Markov model’s parameters are estimated from data, the standard maximum likelihood estimator considers only the first-order (single-step) transitions. But for many problems, the first- order conditional independence assumptions are not satisfied, and as a re- sult the higher order transition probabilities may be poorly approximated. Motivated by the problem of learning an MDP’s parameters for control, we propose an algorithm for learning a first-order Markov model that ex- plicitly takes into account higher order interactions during training. Our algorithm uses an optimization criterion different from maximum likeli- hood, and allows us to learn models that capture longer range effects, but without giving up the benefits of using first-order Markov models. Our experimental results also show the new algorithm outperforming conven- tional maximum likelihood estimation in a number of control problems where the MDP’s parameters are estimated from data.
Cite
Text
Abbeel and Ng. "Learning First-Order Markov Models for Control." Neural Information Processing Systems, 2004.Markdown
[Abbeel and Ng. "Learning First-Order Markov Models for Control." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/abbeel2004neurips-learning/)BibTeX
@inproceedings{abbeel2004neurips-learning,
title = {{Learning First-Order Markov Models for Control}},
author = {Abbeel, Pieter and Ng, Andrew Y.},
booktitle = {Neural Information Processing Systems},
year = {2004},
pages = {1-8},
url = {https://mlanthology.org/neurips/2004/abbeel2004neurips-learning/}
}