Sample Complexity of Distinguishing Cause from Effect

Abstract

We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables $X$ and $Y$, we consider the classical scenario where either $X$ causes $Y$, $Y$ causes $X$, or there is an unmeasured confounder between $X$ and $Y$. Let $m_1$ be the number of observational samples of $(X,Y)$, and let $m_2$ be the number of interventional samples where either $X$ or $Y$ has been subject to an external intervention. We show that if $X$ and $Y$ are over a finite domain of size $k$ and are significantly correlated, the minimum $m_2$ needed is sublinear in $k$. Moreover, as $m_1$ grows, the minimum $m_2$ needed to identify the causal structure decreases. In fact, we can give a tight characterization of the tradeoff between $m_1$ and $m_2$ when $m_1 = O(k)$ or is sufficiently large. We build upon techniques for closeness testing when $m_1$ is small (e.g., sublinear in $k$), and for non-parametric density estimation when $m_2$ is large. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.

Cite

Text

Acharya et al. "Sample Complexity of Distinguishing Cause from Effect." Artificial Intelligence and Statistics, 2023.

Markdown

[Acharya et al. "Sample Complexity of Distinguishing Cause from Effect." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/acharya2023aistats-sample/)

BibTeX

@inproceedings{acharya2023aistats-sample,
  title     = {{Sample Complexity of Distinguishing Cause from Effect}},
  author    = {Acharya, Jayadev and Bhadane, Sourbh and Bhattacharyya, Arnab and Kandasamy, Saravanan and Sun, Ziteng},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {10487-10504},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/acharya2023aistats-sample/}
}