Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors

Kaufmann, Jenny; Li, Victoria R; Wattenberg, Martin; Alvarez-Melis, David; Saphra, Naomi

Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors

Jenny Kaufmann, Victoria R Li, Martin Wattenberg, David Alvarez-Melis, Naomi Saphra

NeurIPSW 2024

/neuripsw/2024/kaufmann2024neuripsw-causation/

Abstract

Using a toy balanced parenthesis classification task with an ambiguous rule, we investigate the correspondence between attention patterns and out-of-distribution generalization behavior of small transformer models. We find that observational tools can predict OOD behavior, challenging the common notion among interpretability researchers that causal intervention is the only basis for explaining model behavior.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Kaufmann et al. "Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors." NeurIPS 2024 Workshops: SciForDL, 2024.

Markdown

[Kaufmann et al. "Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/kaufmann2024neuripsw-causation/)

BibTeX

@inproceedings{kaufmann2024neuripsw-causation,
  title     = {{Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors}},
  author    = {Kaufmann, Jenny and Li, Victoria R and Wattenberg, Martin and Alvarez-Melis, David and Saphra, Naomi},
  booktitle = {NeurIPS 2024 Workshops: SciForDL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/kaufmann2024neuripsw-causation/}
}