Reinforcement Learning for MDPs with Constraints

Abstract

In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.

Cite

Text

Geibel. "Reinforcement Learning for MDPs with Constraints." European Conference on Machine Learning, 2006. doi:10.1007/11871842_63

Markdown

[Geibel. "Reinforcement Learning for MDPs with Constraints." European Conference on Machine Learning, 2006.](https://mlanthology.org/ecmlpkdd/2006/geibel2006ecml-reinforcement/) doi:10.1007/11871842_63

BibTeX

@inproceedings{geibel2006ecml-reinforcement,
  title     = {{Reinforcement Learning for MDPs with Constraints}},
  author    = {Geibel, Peter},
  booktitle = {European Conference on Machine Learning},
  year      = {2006},
  pages     = {646-653},
  doi       = {10.1007/11871842_63},
  url       = {https://mlanthology.org/ecmlpkdd/2006/geibel2006ecml-reinforcement/}
}