Counterfactual Structural Causal Bandits

Abstract

Causal reasoning lies at the heart of robust and generalizable decision-making, and the *Pearl Causal Hierarchy* provides a formal language for distinguishing between observational ($\mathcal{L}_1$), interventional ($\mathcal{L}_2$), and counterfactual ($\mathcal{L}_3$) levels of reasoning. Existing bandit algorithms that leverage causal knowledge have primarily operated within the $\mathcal{L}_1$ and $\mathcal{L}_2$ regimes, treating each realizable and physical intervention as a distinct arm. That is, they have largely excluded counterfactual quantities due to their perceived inaccessibility. In this paper, we introduce a *counterfactual structural causal bandit* (ctf-SCB) framework which expands the agent's feasible action space beyond conventional observational and interventional arms to include a class of realizable counterfactual actions. Our framework offers a principled extension of structural causal bandits and paves the way for integrating counterfactual reasoning into sequential decision-making.

Cite

Text

Park and Lee. "Counterfactual Structural Causal Bandits." International Conference on Learning Representations, 2026.

Markdown

[Park and Lee. "Counterfactual Structural Causal Bandits." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/park2026iclr-counterfactual/)

BibTeX

@inproceedings{park2026iclr-counterfactual,
  title     = {{Counterfactual Structural Causal Bandits}},
  author    = {Park, Min Woo and Lee, Sanghack},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/park2026iclr-counterfactual/}
}