Efficient Reinforcement Learning in Factored MDPs
Abstract
We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) of the DBN. Unlike the original algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states. 1
Cite
Text
Kearns and Koller. "Efficient Reinforcement Learning in Factored MDPs." International Joint Conference on Artificial Intelligence, 1999.Markdown
[Kearns and Koller. "Efficient Reinforcement Learning in Factored MDPs." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/kearns1999ijcai-efficient/)BibTeX
@inproceedings{kearns1999ijcai-efficient,
title = {{Efficient Reinforcement Learning in Factored MDPs}},
author = {Kearns, Michael J. and Koller, Daphne},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {1999},
pages = {740-747},
url = {https://mlanthology.org/ijcai/1999/kearns1999ijcai-efficient/}
}