Adversarial Vulnerability from On-Manifold Inseparability and Poor Off-Manifold Convergence

Abstract

We introduce a new perspective on adversarial vulnerability in image classification: fragility can arise from poor convergence in off-manifold directions. We model data as lying on low-dimensional manifolds, where on-manifold directions correspond to high-variance, data-aligned features and off-manifold directions capture low-variance, nuanced features. Standard first-order optimizers, such as gradient descent, are inherently ill-conditioned, leading to slow or incomplete convergence in off-manifold directions. When data is inseparable along the on-manifold direction, robustness depends on learning these subtle off-manifold features, and failure to converge leaves models exposed to adversarial perturbations. On the theoretical side, we formalize this mechanism through convergence analyses of logistic regression and two-layer linear networks under first-order methods. These results highlight how ill-conditioning slows or prevents convergence in off-manifold directions, thereby motivating the use of second-order methods which mitigate ill-conditioning and achieve convergence across all directions. Empirically, we demonstrate that even without adversarial training, robustness improves significantly with extended training or second-order optimization, underscoring convergence as a central factor. As an auxiliary empirical finding, we observe that batch normalization suppresses these robustness gains, consistent with its implicit bias toward uniform-margin rather than max-margin solutions. By introducing the notions of on- and off-manifold convergence, this work provides a novel theoretical explanation for adversarial vulnerability.

Cite

Text

Haldar et al. "Adversarial Vulnerability from On-Manifold Inseparability and Poor Off-Manifold Convergence." Transactions on Machine Learning Research, 2026.

Markdown

[Haldar et al. "Adversarial Vulnerability from On-Manifold Inseparability and Poor Off-Manifold Convergence." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/haldar2026tmlr-adversarial/)

BibTeX

@article{haldar2026tmlr-adversarial,
  title     = {{Adversarial Vulnerability from On-Manifold Inseparability and Poor Off-Manifold Convergence}},
  author    = {Haldar, Rajdeep and Xing, Yue and Song, Qifan and Lin, Guang},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/haldar2026tmlr-adversarial/}
}