Pixel-Level Certified Explanations via Randomized Smoothing

Abstract

Post-hoc attribution methods aim to explain deep learning predictions by highlighting influential input pixels. However, these explanations are highly non-robust: small, imperceptible input perturbations can drastically alter the attribution map while maintaining the same prediction. This vulnerability undermines their trustworthiness and calls for rigorous robustness guarantees of pixel-level attribution scores. We introduce the first certification framework that guarantees pixel-level robustness for any black-box attribution method using randomized smoothing. By sparsifying and smoothing attribution maps, we reformulate the task as a segmentation problem and certify each pixel’s importance against $\ell_2$-bounded perturbations. We further propose three evaluation metrics to assess certified robustness, localization, and faithfulness. An extensive evaluation of 12 attribution methods across 5 ImageNet models shows that our certified attributions are robust, interpretable, and faithful, enabling reliable use in downstream tasks. Our code is at https://github.com/AlaaAnani/certified-attributions.

Cite

Text

Anani et al. "Pixel-Level Certified Explanations via Randomized Smoothing." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Anani et al. "Pixel-Level Certified Explanations via Randomized Smoothing." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/anani2025icml-pixellevel/)

BibTeX

@inproceedings{anani2025icml-pixellevel,
  title     = {{Pixel-Level Certified Explanations via Randomized Smoothing}},
  author    = {Anani, Alaa and Lorenz, Tobias and Fritz, Mario and Schiele, Bernt},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {1505-1533},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/anani2025icml-pixellevel/}
}