Towards Robust Saliency Maps

Abstract

Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) – a simple yet surprisingly effective saliency map method – and the network’s prediction: given a network, if an input $x$ emits a certain VG saliency map, it is mathematically proven (or disproven) that the network must classify $x$ in a certain way. We then introduce a novel method that combines both Marabou and Crown – two state-of-the-art neural network verifiers, to solve the proposed specification. Experiments on our synthetic dataset and MNIST show that Vanilla Gradient is surprisingly effective as a certification for the predicted output.

Cite

Text

Le et al. "Towards Robust Saliency Maps." Proceedings of the 16th Asian Conference on Machine Learning, 2024.

Markdown

[Le et al. "Towards Robust Saliency Maps." Proceedings of the 16th Asian Conference on Machine Learning, 2024.](https://mlanthology.org/acml/2024/le2024acml-robust/)

BibTeX

@inproceedings{le2024acml-robust,
  title     = {{Towards Robust Saliency Maps}},
  author    = {Le, Nham and Gurfinkel, Arie and Si, Xujie and Geng, Chuqin},
  booktitle = {Proceedings of the 16th Asian Conference on Machine Learning},
  year      = {2024},
  pages     = {351-366},
  volume    = {260},
  url       = {https://mlanthology.org/acml/2024/le2024acml-robust/}
}