Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders
Abstract
Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to quantile estimation. Our algorithm enjoys the computational ease of fitted-Q-iteration and statistical improvements (reduced dependence on quantile estimation error) from orthogonalization. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis. In particular, our model of sequential unobserved confounders yields an online Markov decision process, rather than partially observed Markov decision process: we illustrate how this can enable warm-starting optimistic reinforcement learning algorithms with valid robust bounds from observational data.
Cite
Text
Bruns-Smith and Zhou. "Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders." NeurIPS 2023 Workshops: ReALML, 2023.Markdown
[Bruns-Smith and Zhou. "Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders." NeurIPS 2023 Workshops: ReALML, 2023.](https://mlanthology.org/neuripsw/2023/brunssmith2023neuripsw-robust/)BibTeX
@inproceedings{brunssmith2023neuripsw-robust,
title = {{Robust Fitted-Q-Evaluation and Iteration Under Sequentially Exogenous Unobserved Confounders}},
author = {Bruns-Smith, David and Zhou, Angela},
booktitle = {NeurIPS 2023 Workshops: ReALML},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/brunssmith2023neuripsw-robust/}
}