Adaptively Exploiting D-Separators with Causal Bandits
Abstract
Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent "causal bandit" algorithms provably incur less regret. However, in practice it is desirable to be agnostic to whether observed variables are a d-separator. Ideally, an algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of the presence or absence of a d-separator. In this work, we formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a d-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator. Crucially, our algorithm does not require any oracle knowledge of whether a d-separator is observed. We also generalize this adaptivity to other conditions, such as the front-door criterion.
Cite
Text
Bilodeau et al. "Adaptively Exploiting D-Separators with Causal Bandits." Neural Information Processing Systems, 2022.Markdown
[Bilodeau et al. "Adaptively Exploiting D-Separators with Causal Bandits." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/bilodeau2022neurips-adaptively/)BibTeX
@inproceedings{bilodeau2022neurips-adaptively,
title = {{Adaptively Exploiting D-Separators with Causal Bandits}},
author = {Bilodeau, Blair and Wang, Linbo and Roy, Dan},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/bilodeau2022neurips-adaptively/}
}