Off-Policy Estimation with Adaptively Collected Data: The Power of Online Learning

Abstract

We consider estimation of a linear functional of the treatment effect from adaptively collected data. This problem finds a variety of applications including off-policy evaluation in contextual bandits, and estimation of the average treatment effect in causal inference. While a certain class of augmented inverse propensity weighting (AIPW) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first present generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (1) the tabular case; (2) the case of linear function approximation; and (3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the AIPW estimator using no-regret online learning algorithms.

Cite

Text

Lee and Ma. "Off-Policy Estimation with Adaptively Collected Data: The Power of Online Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-4255

Markdown

[Lee and Ma. "Off-Policy Estimation with Adaptively Collected Data: The Power of Online Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lee2024neurips-offpolicy/) doi:10.52202/079017-4255

BibTeX

@inproceedings{lee2024neurips-offpolicy,
  title     = {{Off-Policy Estimation with Adaptively Collected Data: The Power of Online Learning}},
  author    = {Lee, Jeonghwan and Ma, Cong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4255},
  url       = {https://mlanthology.org/neurips/2024/lee2024neurips-offpolicy/}
}