Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders

Bruns-Smith, David; Zhou, Angela

Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders

NeurIPSW 2023

/neuripsw/2023/brunssmith2023neuripsw-robust/

Abstract

Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to quantile estimation. Our algorithm enjoys the computational ease of fitted-Q-iteration and statistical improvements (reduced dependence on quantile estimation error) from orthogonalization. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis. In particular, our model of sequential unobserved confounders yields an online Markov decision process, rather than partially observed Markov decision process: we illustrate how this can enable warm-starting optimistic reinforcement learning algorithms with valid robust bounds from observational data.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Bruns-Smith and Zhou. "Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders." NeurIPS 2023 Workshops: ReALML, 2023.

Markdown

[Bruns-Smith and Zhou. "Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders." NeurIPS 2023 Workshops: ReALML, 2023.](https://mlanthology.org/neuripsw/2023/brunssmith2023neuripsw-robust/)

BibTeX

@inproceedings{brunssmith2023neuripsw-robust,
  title     = {{Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders}},
  author    = {Bruns-Smith, David and Zhou, Angela},
  booktitle = {NeurIPS 2023 Workshops: ReALML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/brunssmith2023neuripsw-robust/}
}