Markovian Interference in Experiments

Abstract

We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited supply of products). Despite outsize practical importance, the best estimators for this `Markovian' interference problem are largely heuristic in nature, and their bias is not well understood. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general have exponentially smaller variance than off-policy evaluation. At the same time, its bias is second order in the impact of the intervention. This yields a striking bias-variance tradeoff so that the DQ estimator effectively dominates state-of-the-art alternatives. From a theoretical perspective, we introduce three separate novel techniques that are of independent interest in the theory of Reinforcement Learning (RL). Our empirical evaluation includes a set of experiments on a city-scale ride-hailing simulator.

Cite

Text

Farias et al. "Markovian Interference in Experiments." Neural Information Processing Systems, 2022.

Markdown

[Farias et al. "Markovian Interference in Experiments." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/farias2022neurips-markovian/)

BibTeX

@inproceedings{farias2022neurips-markovian,
  title     = {{Markovian Interference in Experiments}},
  author    = {Farias, Vivek and Li, Andrew and Peng, Tianyi and Zheng, Andrew},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/farias2022neurips-markovian/}
}