Polynomial-Time Approximability of Constrained Reinforcement Learning

Abstract

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

Cite

Text

Mcmahan. "Polynomial-Time Approximability of Constrained Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Mcmahan. "Polynomial-Time Approximability of Constrained Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/mcmahan2025icml-polynomialtime/)

BibTeX

@inproceedings{mcmahan2025icml-polynomialtime,
  title     = {{Polynomial-Time Approximability of Constrained Reinforcement Learning}},
  author    = {Mcmahan, Jeremy},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {43417-43439},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/mcmahan2025icml-polynomialtime/}
}