Sample, Estimate, Aggregate: A Recipe for Causal Discovery Foundation Models

Abstract

Causal discovery, the task of inferring causal structure from data, promises to accelerate scientific research, inform policy making, and more. However, the per-dataset nature of existing causal discovery algorithms renders them slow, data hungry, and brittle. Inspired by foundation models, we propose a causal discovery framework where a deep learning model is pretrained to resolve predictions from classical discovery algorithms run over smaller subsets of variables. This method is enabled by the observations that the outputs from classical algorithms are fast to compute for small problems, informative of (marginal) data structure, and their structure outputs as objects remain comparable across datasets. Our method achieves state-of-the-art performance on synthetic and realistic datasets, generalizes to data generating mechanisms not seen during training, and offers inference speeds that are orders of magnitude faster than existing models.

Cite

Text

Wu et al. "Sample, Estimate, Aggregate: A Recipe for Causal Discovery Foundation Models." ICLR 2024 Workshops: MLGenX, 2024.

Markdown

[Wu et al. "Sample, Estimate, Aggregate: A Recipe for Causal Discovery Foundation Models." ICLR 2024 Workshops: MLGenX, 2024.](https://mlanthology.org/iclrw/2024/wu2024iclrw-sample/)

BibTeX

@inproceedings{wu2024iclrw-sample,
  title     = {{Sample, Estimate, Aggregate: A Recipe for Causal Discovery Foundation Models}},
  author    = {Wu, Menghua and Bao, Yujia and Barzilay, Regina and Jaakkola, Tommi},
  booktitle = {ICLR 2024 Workshops: MLGenX},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/wu2024iclrw-sample/}
}