The Infinite Regionalized Policy Representation
Abstract
We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems.
Cite
Text
Liu et al. "The Infinite Regionalized Policy Representation." International Conference on Machine Learning, 2011.Markdown
[Liu et al. "The Infinite Regionalized Policy Representation." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/liu2011icml-infinite/)BibTeX
@inproceedings{liu2011icml-infinite,
title = {{The Infinite Regionalized Policy Representation}},
author = {Liu, Miao and Liao, Xuejun and Carin, Lawrence},
booktitle = {International Conference on Machine Learning},
year = {2011},
pages = {769-776},
url = {https://mlanthology.org/icml/2011/liu2011icml-infinite/}
}