Counterfactual Structural Causal Bandits
Abstract
Causal reasoning lies at the heart of robust and generalizable decision-making, and the *Pearl Causal Hierarchy* provides a formal language for distinguishing between observational ($\mathcal{L}_1$), interventional ($\mathcal{L}_2$), and counterfactual ($\mathcal{L}_3$) levels of reasoning. Existing bandit algorithms that leverage causal knowledge have primarily operated within the $\mathcal{L}_1$ and $\mathcal{L}_2$ regimes, treating each realizable and physical intervention as a distinct arm. That is, they have largely excluded counterfactual quantities due to their perceived inaccessibility. In this paper, we introduce a *counterfactual structural causal bandit* (ctf-SCB) framework which expands the agent's feasible action space beyond conventional observational and interventional arms to include a class of realizable counterfactual actions. Our framework offers a principled extension of structural causal bandits and paves the way for integrating counterfactual reasoning into sequential decision-making.
Cite
Text
Park and Lee. "Counterfactual Structural Causal Bandits." International Conference on Learning Representations, 2026.Markdown
[Park and Lee. "Counterfactual Structural Causal Bandits." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/park2026iclr-counterfactual/)BibTeX
@inproceedings{park2026iclr-counterfactual,
title = {{Counterfactual Structural Causal Bandits}},
author = {Park, Min Woo and Lee, Sanghack},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/park2026iclr-counterfactual/}
}