Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
Abstract
The vulnerability of deep neural networks to adversarial examples poses significant challenges to their reliable deployment. Among existing empirical defenses, adversarial training and robust distillation have proven the most effective. In this paper, we identify a property originally associated with model intellectual property, i.e., probability sparsity induced by nasty training, and demonstrate that it can also provide interpretable improvements to adversarial robustness. We begin by analyzing how nasty training induces sparse probability distributions and qualitatively explore the spatial metric preferences this sparsity introduces to the model. Building on these insights, we propose a simple yet effective adversarial training method, nasty adversarial training (NAT), which incorporates probability sparsity as a regularization mechanism to boost adversarial robustness. Both theoretical analysis and experimental results validate the effectiveness of NAT, highlighting its potential to enhance the adversarial robustness of deep neural networks in an interpretable manner.
Cite
Text
Zhou et al. "Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement." International Conference on Learning Representations, 2026.Markdown
[Zhou et al. "Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhou2026iclr-nasty/)BibTeX
@inproceedings{zhou2026iclr-nasty,
title = {{Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement}},
author = {Zhou, Yuhang and Hua, Zhongyun and Gu, Zhaoquan and Tang, Keke and Lan, Rushi and Zhang, Yushu and Liao, Qing and Zhang, Leo Yu},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhou2026iclr-nasty/}
}