Causal Bandits: Learning Good Interventions via Causal Inference
Abstract
We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.
Cite
Text
Lattimore et al. "Causal Bandits: Learning Good Interventions via Causal Inference." Neural Information Processing Systems, 2016.Markdown
[Lattimore et al. "Causal Bandits: Learning Good Interventions via Causal Inference." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/lattimore2016neurips-causal/)BibTeX
@inproceedings{lattimore2016neurips-causal,
title = {{Causal Bandits: Learning Good Interventions via Causal Inference}},
author = {Lattimore, Finnian and Lattimore, Tor and Reid, Mark D.},
booktitle = {Neural Information Processing Systems},
year = {2016},
pages = {1181-1189},
url = {https://mlanthology.org/neurips/2016/lattimore2016neurips-causal/}
}