An Environment Model for Nonstationary Reinforcement Learning
Abstract
Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta(cid:173) tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more gen(cid:173) eral POMDP model unnecessarily increases the problem complex(cid:173) ity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.
Cite
Text
Choi et al. "An Environment Model for Nonstationary Reinforcement Learning." Neural Information Processing Systems, 1999.Markdown
[Choi et al. "An Environment Model for Nonstationary Reinforcement Learning." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/choi1999neurips-environment/)BibTeX
@inproceedings{choi1999neurips-environment,
title = {{An Environment Model for Nonstationary Reinforcement Learning}},
author = {Choi, Samuel P. M. and Yeung, Dit-Yan and Zhang, Nevin Lianwen},
booktitle = {Neural Information Processing Systems},
year = {1999},
pages = {987-993},
url = {https://mlanthology.org/neurips/1999/choi1999neurips-environment/}
}