POPCORN: Partially Observed Prediction Constrained Reinforcement Learning

Abstract

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

Cite

Text

Futoma et al. "POPCORN: Partially Observed Prediction Constrained Reinforcement Learning." Artificial Intelligence and Statistics, 2020.

Markdown

[Futoma et al. "POPCORN: Partially Observed Prediction Constrained Reinforcement Learning." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/futoma2020aistats-popcorn/)

BibTeX

@inproceedings{futoma2020aistats-popcorn,
  title     = {{POPCORN: Partially Observed Prediction Constrained Reinforcement Learning}},
  author    = {Futoma, Joseph and Hughes, Michael and Doshi-Velez, Finale},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {3578-3588},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/futoma2020aistats-popcorn/}
}