Using MDP Characteristics to Guide Exploration in Reinforcement Learning

Abstract

We present a new approach for exploration in Reinforcement Learning (RL) based on certain properties of the Markov Decision Processes (MDP). Our strategy facilitates a more uniform visitation of the state space, a more extensive sampling of actions with potentially high variance of the action-value function estimates, and encourages the RL agent to focus on states where it has most control over the outcomes of its actions. Our exploration strategy can be used in combination with other existing exploration techniques, and we experimentally demonstrate that it can improve the performance of both undirected and directed exploration methods. In contrast to other directed methods, the exploration-relevant information can be precomputed beforehand and then used during learning without additional computation cost.

Cite

Text

Ratitch and Precup. "Using MDP Characteristics to Guide Exploration in Reinforcement Learning." European Conference on Machine Learning, 2003. doi:10.1007/978-3-540-39857-8_29

Markdown

[Ratitch and Precup. "Using MDP Characteristics to Guide Exploration in Reinforcement Learning." European Conference on Machine Learning, 2003.](https://mlanthology.org/ecmlpkdd/2003/ratitch2003ecml-using/) doi:10.1007/978-3-540-39857-8_29

BibTeX

@inproceedings{ratitch2003ecml-using,
  title     = {{Using MDP Characteristics to Guide Exploration in Reinforcement Learning}},
  author    = {Ratitch, Bohdana and Precup, Doina},
  booktitle = {European Conference on Machine Learning},
  year      = {2003},
  pages     = {313-324},
  doi       = {10.1007/978-3-540-39857-8_29},
  url       = {https://mlanthology.org/ecmlpkdd/2003/ratitch2003ecml-using/}
}