Budgeted Reinforcement Learning in Continuous State Space

Abstract

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

Cite

Text

Carrara et al. "Budgeted Reinforcement Learning in Continuous State Space." Neural Information Processing Systems, 2019.

Markdown

[Carrara et al. "Budgeted Reinforcement Learning in Continuous State Space." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/carrara2019neurips-budgeted/)

BibTeX

@inproceedings{carrara2019neurips-budgeted,
  title     = {{Budgeted Reinforcement Learning in Continuous State Space}},
  author    = {Carrara, Nicolas and Leurent, Edouard and Laroche, Romain and Urvoy, Tanguy and Maillard, Odalric-Ambrym and Pietquin, Olivier},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {9299-9309},
  url       = {https://mlanthology.org/neurips/2019/carrara2019neurips-budgeted/}
}