An Environment Model for Nonstationary Reinforcement Learning

Abstract

Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta(cid:173) tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more gen(cid:173) eral POMDP model unnecessarily increases the problem complex(cid:173) ity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.

Cite

Text

Choi et al. "An Environment Model for Nonstationary Reinforcement Learning." Neural Information Processing Systems, 1999.

Markdown

[Choi et al. "An Environment Model for Nonstationary Reinforcement Learning." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/choi1999neurips-environment/)

BibTeX

@inproceedings{choi1999neurips-environment,
  title     = {{An Environment Model for Nonstationary Reinforcement Learning}},
  author    = {Choi, Samuel P. M. and Yeung, Dit-Yan and Zhang, Nevin Lianwen},
  booktitle = {Neural Information Processing Systems},
  year      = {1999},
  pages     = {987-993},
  url       = {https://mlanthology.org/neurips/1999/choi1999neurips-environment/}
}