On the Robustness of Global Feature Effect Explanations

Abstract

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

Cite

Text

Baniecki et al. "On the Robustness of Global Feature Effect Explanations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70344-7_8

Markdown

[Baniecki et al. "On the Robustness of Global Feature Effect Explanations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/baniecki2024ecmlpkdd-robustness/) doi:10.1007/978-3-031-70344-7_8

BibTeX

@inproceedings{baniecki2024ecmlpkdd-robustness,
  title     = {{On the Robustness of Global Feature Effect Explanations}},
  author    = {Baniecki, Hubert and Casalicchio, Giuseppe and Bischl, Bernd and Biecek, Przemyslaw},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {125-142},
  doi       = {10.1007/978-3-031-70344-7_8},
  url       = {https://mlanthology.org/ecmlpkdd/2024/baniecki2024ecmlpkdd-robustness/}
}