Revealing Vulnerable Regions Through Diverse Adversarial Examples

Abstract

Abstract Current explainable AI approaches to Deep Neural Networks (DNNs) primarily aim to understand network behavior by identifying key input features that influence predictions. However, these methods often fail to identify vulnerable regions in the input that are sensitive to minor perturbations and pose significant security risks. The vulnerability of DNNs is typically studied through adversarial examples, but traditional norm-based algorithms, lacking spatial constraints, distribute perturbations across the entire image, obscuring these critical areas. To overcome this limitation, we propose the Vulnerable Region Discovery Attack (VrdAttack), an efficient method that leverages Differential Evolution to generate diverse one-pixel perturbations, enabling the discovery of vulnerable regions and uncovering pixel-level vulnerabilities in Deep Neural Networks (DNNs). Our extensive experiments on CIFAR-10 and ImageNet demonstrate that our proposed VrdAttack outperforms existing methods in identifying diverse critical weak points in an input, highlighting model-specific vulnerabilities, and revealing the impact of adversarial training on these vulnerable regions.

Cite

Text

Zhao et al. "Revealing Vulnerable Regions Through Diverse Adversarial Examples." Machine Learning, 2025. doi:10.1007/S10994-025-06788-Z

Markdown

[Zhao et al. "Revealing Vulnerable Regions Through Diverse Adversarial Examples." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/zhao2025mlj-revealing/) doi:10.1007/S10994-025-06788-Z

BibTeX

@article{zhao2025mlj-revealing,
  title     = {{Revealing Vulnerable Regions Through Diverse Adversarial Examples}},
  author    = {Zhao, Yunce and Huang, Wei and Liu, Wei and Yao, Xin},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {163},
  doi       = {10.1007/S10994-025-06788-Z},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/zhao2025mlj-revealing/}
}