Active Reinforcement Learning

Epshteyn, Arkady; Vogel, Adam; DeJong, Gerald

doi:10.1145/1390156.1390194

Active Reinforcement Learning

Arkady Epshteyn, Adam Vogel, Gerald DeJong

ICML 2008 pp. 296-303

doi:10.1145/1390156.1390194 /icml/2008/epshteyn2008icml-active/

Abstract

When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, the agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.

PDF ICML Semantic Scholar

Cite

Text

Epshteyn et al. "Active Reinforcement Learning." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390194

Markdown

[Epshteyn et al. "Active Reinforcement Learning." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/epshteyn2008icml-active/) doi:10.1145/1390156.1390194

BibTeX

@inproceedings{epshteyn2008icml-active,
  title     = {{Active Reinforcement Learning}},
  author    = {Epshteyn, Arkady and Vogel, Adam and DeJong, Gerald},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {296-303},
  doi       = {10.1145/1390156.1390194},
  url       = {https://mlanthology.org/icml/2008/epshteyn2008icml-active/}
}