A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs

Abstract

An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.

Cite

Text

Fox and Tennenholtz. "A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs." AAAI Conference on Artificial Intelligence, 2007.

Markdown

[Fox and Tennenholtz. "A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs." AAAI Conference on Artificial Intelligence, 2007.](https://mlanthology.org/aaai/2007/fox2007aaai-reinforcement/)

BibTeX

@inproceedings{fox2007aaai-reinforcement,
  title     = {{A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs}},
  author    = {Fox, Roy and Tennenholtz, Moshe},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2007},
  pages     = {553-558},
  url       = {https://mlanthology.org/aaai/2007/fox2007aaai-reinforcement/}
}