Data-Faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables

Abstract

The state-of-the-art feature attribution methods often neglect the influence of unobservable confounders, posing a risk of misinterpretation, especially when it is crucial for the interpretation to remain faithful to the data. To counteract this, we propose a new approach, data-faithful feature attribution, which trains a confounder-free model using instrumental variables. The cluttered effects of unobservable confounders in a model trained as such are decoupled from input features, thereby aligning the output of the model with the contribution of input features to the target feature in the data generation. Furthermore, feature attribution results produced by our method are more robust when focusing on attributions from the perspective of data generation. Our experiments on both synthetic and real-world datasets demonstrate the effectiveness of our approaches.

Cite

Text

Sun et al. "Data-Faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables." Neural Information Processing Systems, 2024. doi:10.52202/079017-1428

Markdown

[Sun et al. "Data-Faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/sun2024neurips-datafaithful/) doi:10.52202/079017-1428

BibTeX

@inproceedings{sun2024neurips-datafaithful,
  title     = {{Data-Faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables}},
  author    = {Sun, Qiheng and Xia, Haocheng and Liu, Jinfei},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1428},
  url       = {https://mlanthology.org/neurips/2024/sun2024neurips-datafaithful/}
}