A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs
Abstract
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.
Cite
Text
Fox and Tennenholtz. "A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs." AAAI Conference on Artificial Intelligence, 2007.Markdown
[Fox and Tennenholtz. "A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs." AAAI Conference on Artificial Intelligence, 2007.](https://mlanthology.org/aaai/2007/fox2007aaai-reinforcement/)BibTeX
@inproceedings{fox2007aaai-reinforcement,
title = {{A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs}},
author = {Fox, Roy and Tennenholtz, Moshe},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2007},
pages = {553-558},
url = {https://mlanthology.org/aaai/2007/fox2007aaai-reinforcement/}
}