Constrained Upper Confidence Reinforcement Learning
Abstract
Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.
Cite
Text
Zheng and Ratliff. "Constrained Upper Confidence Reinforcement Learning." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.Markdown
[Zheng and Ratliff. "Constrained Upper Confidence Reinforcement Learning." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.](https://mlanthology.org/l4dc/2020/zheng2020l4dc-constrained/)BibTeX
@inproceedings{zheng2020l4dc-constrained,
title = {{Constrained Upper Confidence Reinforcement Learning}},
author = {Zheng, Liyuan and Ratliff, Lillian},
booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
year = {2020},
pages = {620-629},
volume = {120},
url = {https://mlanthology.org/l4dc/2020/zheng2020l4dc-constrained/}
}