Causal Bandits: Learning Good Interventions via Causal Inference

Abstract

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.

Cite

Text

Lattimore et al. "Causal Bandits: Learning Good Interventions via Causal Inference." Neural Information Processing Systems, 2016.

Markdown

[Lattimore et al. "Causal Bandits: Learning Good Interventions via Causal Inference." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/lattimore2016neurips-causal/)

BibTeX

@inproceedings{lattimore2016neurips-causal,
  title     = {{Causal Bandits: Learning Good Interventions via Causal Inference}},
  author    = {Lattimore, Finnian and Lattimore, Tor and Reid, Mark D.},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {1181-1189},
  url       = {https://mlanthology.org/neurips/2016/lattimore2016neurips-causal/}
}