Identifying Regularization Schemes That Make Feature Attributions Faithful

Abstract

Feature attribution methods assign a score to each input dimension as a measure of the relevance of that dimension to a model's output. Despite wide use, the feature importance rankings induced by gradient-based feature attributions are unfaithful, that is, they do not correlate with the input-perturbation sensitivity of the model---unless the model is trained to be adversarially robust. Here we demonstrate that these concerns translate to models trained for protein function prediction tasks. Despite making a model's gradient-based attributions faithful to the model, adversarial training has low real-data performance. We find that independent Gaussian noise corruption is an effective alternative, to adversarial training, that confers faithfulness onto a model's gradient-based attributions without performance degradation. On the other hand, we observe no meaningful faithfulness benefits from regularization schemes like dropout and weight decay. We translate these insights to a real-world protein function prediction task, where the gradient-based feature attributions of noise-regularized models, correctly indicate low sensitivity to irrelevant gap tokens in a protein's sequence alignment.

Cite

Text

Adebayo et al. "Identifying Regularization Schemes That Make Feature Attributions Faithful." NeurIPS 2023 Workshops: AI4D3, 2023.

Markdown

[Adebayo et al. "Identifying Regularization Schemes That Make Feature Attributions Faithful." NeurIPS 2023 Workshops: AI4D3, 2023.](https://mlanthology.org/neuripsw/2023/adebayo2023neuripsw-identifying/)

BibTeX

@inproceedings{adebayo2023neuripsw-identifying,
  title     = {{Identifying Regularization Schemes That Make Feature Attributions Faithful}},
  author    = {Adebayo, Julius and Stanton, Samuel Don and Kelow, Simon and Maser, Michael and Bonneau, Richard and Gligorijevic, Vladimir and Cho, Kyunghyun and Ra, Stephen and Frey, Nathan C.},
  booktitle = {NeurIPS 2023 Workshops: AI4D3},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/adebayo2023neuripsw-identifying/}
}