Ε-MDPs: Learning in Varying Environments

Abstract

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.

Cite

Text

Szita et al. "Ε-MDPs: Learning in Varying Environments." Journal of Machine Learning Research, 2002.

Markdown

[Szita et al. "Ε-MDPs: Learning in Varying Environments." Journal of Machine Learning Research, 2002.](https://mlanthology.org/jmlr/2002/szita2002jmlr-mdps/)

BibTeX

@article{szita2002jmlr-mdps,
  title     = {{Ε-MDPs: Learning in Varying Environments}},
  author    = {Szita, István and Takács, Bálint and Lörincz, András},
  journal   = {Journal of Machine Learning Research},
  year      = {2002},
  pages     = {145-174},
  volume    = {3},
  url       = {https://mlanthology.org/jmlr/2002/szita2002jmlr-mdps/}
}