Constrained Reinforcement Learning via Policy Splitting
Abstract
We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.
Cite
Text
Chen et al. "Constrained Reinforcement Learning via Policy Splitting." Proceedings of The 12th Asian Conference on Machine Learning, 2020.Markdown
[Chen et al. "Constrained Reinforcement Learning via Policy Splitting." Proceedings of The 12th Asian Conference on Machine Learning, 2020.](https://mlanthology.org/acml/2020/chen2020acml-constrained/)BibTeX
@inproceedings{chen2020acml-constrained,
title = {{Constrained Reinforcement Learning via Policy Splitting}},
author = {Chen, Haoxian and Lam, Henry and Li, Fengpei and Meisami, Amirhossein},
booktitle = {Proceedings of The 12th Asian Conference on Machine Learning},
year = {2020},
pages = {209-224},
volume = {129},
url = {https://mlanthology.org/acml/2020/chen2020acml-constrained/}
}