Ε-MDPs: Learning in Varying Environments
Abstract
In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.
Cite
Text
Szita et al. "Ε-MDPs: Learning in Varying Environments." Journal of Machine Learning Research, 2002.Markdown
[Szita et al. "Ε-MDPs: Learning in Varying Environments." Journal of Machine Learning Research, 2002.](https://mlanthology.org/jmlr/2002/szita2002jmlr-mdps/)BibTeX
@article{szita2002jmlr-mdps,
title = {{Ε-MDPs: Learning in Varying Environments}},
author = {Szita, István and Takács, Bálint and Lörincz, András},
journal = {Journal of Machine Learning Research},
year = {2002},
pages = {145-174},
volume = {3},
url = {https://mlanthology.org/jmlr/2002/szita2002jmlr-mdps/}
}