Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Abstract

The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.

Cite

Text

Haldar et al. "Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability." Artificial Intelligence and Statistics, 2024.

Markdown

[Haldar et al. "Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability." Artificial Intelligence and Statistics, 2024.](https://mlanthology.org/aistats/2024/haldar2024aistats-effect/)

BibTeX

@inproceedings{haldar2024aistats-effect,
  title     = {{Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability}},
  author    = {Haldar, Rajdeep and Xing, Yue and Song, Qifan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2024},
  pages     = {1090-1098},
  volume    = {238},
  url       = {https://mlanthology.org/aistats/2024/haldar2024aistats-effect/}
}