Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors

Abstract

Using a toy balanced parenthesis classification task with an ambiguous rule, we investigate the correspondence between attention patterns and out-of-distribution generalization behavior of small transformer models. We find that observational tools can predict OOD behavior, challenging the common notion among interpretability researchers that causal intervention is the only basis for explaining model behavior.

Cite

Text

Kaufmann et al. "Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors." NeurIPS 2024 Workshops: SciForDL, 2024.

Markdown

[Kaufmann et al. "Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/kaufmann2024neuripsw-causation/)

BibTeX

@inproceedings{kaufmann2024neuripsw-causation,
  title     = {{Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors}},
  author    = {Kaufmann, Jenny and Li, Victoria R and Wattenberg, Martin and Alvarez-Melis, David and Saphra, Naomi},
  booktitle = {NeurIPS 2024 Workshops: SciForDL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/kaufmann2024neuripsw-causation/}
}