Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators

Abstract

Post-hoc explanation methods attempt to make the inner workings of deep neural networks more comprehensible and trustworthy, which otherwise act as black box models. However, since a ground truth is in general lacking, local post-hoc explanation methods, which assign importance scores to input features, are challenging to evaluate. One of the most popular evaluation frameworks is to perturb features deemed important by an explanation and to measure the change in prediction accuracy. Intuitively, a large decrease in prediction accuracy would indicate that the explanation has correctly quantified the importance of features with respect to the prediction outcome (e.g., logits). However, the change in the prediction outcome may stem from perturbation artifacts, since perturbed samples in the test dataset are out of distribution (OOD) compared to the training dataset and can therefore potentially disturb the model in an unexpected manner. To overcome this challenge, we propose feature perturbation augmentation (FPA) which creates and adds perturbed images during the model training. Our computational experiments suggest that FPA makes the considered models more robust against perturbations. Overall, FPA is an intuitive and straightforward data augmentation technique that renders the evaluation of post-hoc explanations more trustworthy.

Cite

Text

Brocki and Chung. "Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators." ICLR 2023 Workshops: Trustworthy_ML, 2023.

Markdown

[Brocki and Chung. "Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators." ICLR 2023 Workshops: Trustworthy_ML, 2023.](https://mlanthology.org/iclrw/2023/brocki2023iclrw-feature/)

BibTeX

@inproceedings{brocki2023iclrw-feature,
  title     = {{Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators}},
  author    = {Brocki, Lennart and Chung, Neo Christopher},
  booktitle = {ICLR 2023 Workshops: Trustworthy_ML},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/brocki2023iclrw-feature/}
}