Conjugate Markov Decision Processes
Abstract
Many open problems involve the search for a mapping that is used by an algorithm solving an MDP. Useful mappings are often from the state set to some other set. Examples include representation discovery (a mapping to a feature space) and skill discovery (a mapping to skill termination probabilities). Different mappings result in algorithms achieving varying expected returns. In this paper we present a novel approach to the search for any mapping used by any algorithm attempting to solve an MDP, for that which results in maximum expected return.
Cite
Text
Thomas and Barto. "Conjugate Markov Decision Processes." International Conference on Machine Learning, 2011.Markdown
[Thomas and Barto. "Conjugate Markov Decision Processes." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/thomas2011icml-conjugate/)BibTeX
@inproceedings{thomas2011icml-conjugate,
title = {{Conjugate Markov Decision Processes}},
author = {Thomas, Philip S. and Barto, Andrew G.},
booktitle = {International Conference on Machine Learning},
year = {2011},
pages = {137-144},
url = {https://mlanthology.org/icml/2011/thomas2011icml-conjugate/}
}