Improving Interpretability via Regularization of Neural Activation Sensitivity

Moshe, Ofir; Fidel, Gil; Bitton, Ron; Shabtai, Asaf

doi:10.1007/S10994-024-06549-4

Improving Interpretability via Regularization of Neural Activation Sensitivity

Ofir Moshe, Gil Fidel, Ron Bitton, Asaf Shabtai

MLJ 2024 pp. 6165-6196

doi:10.1007/S10994-024-06549-4 /mlj/2024/moshe2024mlj-improving/

Abstract

State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)

PDF MLJ Semantic Scholar

Cite

Text

Moshe et al. "Improving Interpretability via Regularization of Neural Activation Sensitivity." Machine Learning, 2024. doi:10.1007/S10994-024-06549-4

Markdown

[Moshe et al. "Improving Interpretability via Regularization of Neural Activation Sensitivity." Machine Learning, 2024.](https://mlanthology.org/mlj/2024/moshe2024mlj-improving/) doi:10.1007/S10994-024-06549-4

BibTeX

@article{moshe2024mlj-improving,
  title     = {{Improving Interpretability via Regularization of Neural Activation Sensitivity}},
  author    = {Moshe, Ofir and Fidel, Gil and Bitton, Ron and Shabtai, Asaf},
  journal   = {Machine Learning},
  year      = {2024},
  pages     = {6165-6196},
  doi       = {10.1007/S10994-024-06549-4},
  volume    = {113},
  url       = {https://mlanthology.org/mlj/2024/moshe2024mlj-improving/}
}