Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

Abstract

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.

Cite

Text

McMahan. "Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time." Neural Information Processing Systems, 2024. doi:10.52202/079017-2994

Markdown

[McMahan. "Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/mcmahan2024neurips-deterministic/) doi:10.52202/079017-2994

BibTeX

@inproceedings{mcmahan2024neurips-deterministic,
  title     = {{Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time}},
  author    = {McMahan, Jeremy},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2994},
  url       = {https://mlanthology.org/neurips/2024/mcmahan2024neurips-deterministic/}
}