Reliable Poisoned Sample Detection Against Backdoor Attacks Enhanced by Sharpness Aware Minimization

Zhang, Mingda; Zhu, Mingli; Zhu, Zihao; Shen, Li; Wu, Baoyuan

Reliable Poisoned Sample Detection Against Backdoor Attacks Enhanced by Sharpness Aware Minimization

Mingda Zhang, Mingli Zhu, Zihao Zhu, Li Shen, Baoyuan Wu

ICLR 2026

/iclr/2026/zhang2026iclr-reliable/

Abstract

This work investigates Poisoned Sample Detection (PSD), a promising defense approach against backdoor attacks. However, we observe that the effectiveness of many advanced PSD methods degrades significantly under weak backdoor attacks (\eg, low poisoning ratios or weak trigger patterns). To substantiate this observation, we conduct a statistical analysis across various attacks and PSD methods, revealing a strong correlation between the strength of the backdoor effect and the detection performance. Inspired by this, we propose amplifying the backdoor effect through training with Sharpness-Aware Minimization (SAM). Both theoretical insights and empirical evidence validate that SAM enhances the activations of top Trigger Activation Change (TAC) neurons while suppressing others. Based on this, we introduce SAM-enhanced PSD, a simple yet effective framework that seamlessly improves existing PSD methods by extracting detection features from the SAM-trained model rather than the conventionally trained model. Extensive experiments across multiple benchmarks demonstrate that our approach significantly improves detection performance under both strong and weak backdoor attacks, achieving an average True Positive Rate (TPR) gain of +34.3% over conventional PSD methods. Overall, we believe that the revealed correlation between the backdoor effect and detection performance could inspire future research advancements.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Reliable Poisoned Sample Detection Against Backdoor Attacks Enhanced by Sharpness Aware Minimization." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Reliable Poisoned Sample Detection Against Backdoor Attacks Enhanced by Sharpness Aware Minimization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-reliable/)

BibTeX

@inproceedings{zhang2026iclr-reliable,
  title     = {{Reliable Poisoned Sample Detection Against Backdoor Attacks Enhanced by Sharpness Aware Minimization}},
  author    = {Zhang, Mingda and Zhu, Mingli and Zhu, Zihao and Shen, Li and Wu, Baoyuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-reliable/}
}