Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning

Abstract

Backdoor attacks pose a significant threat to machine learning models, allowing adversaries to implant hidden triggers that alter model behavior when activated. Although gradient ascent (GA)-based unlearning has been proposed as an efficient backdoor removal approach, we identify a critical yet overlooked issue: vanilla GA does not eliminate the trigger but shifts its impact to different classes, a phenomenon we call trigger shifting. To address this, we propose Robust Gradient Ascent (RGA), which introduces a dynamic penalty mechanism to regulate GA's strength and prevent excessive unlearning. Our experiments show that RGA effectively removes backdoors while preserving model utility, offering a more reliable defense against backdoor attacks.

Cite

Text

Zhao et al. "Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning." International Conference on Learning Representations, 2026.

Markdown

[Zhao et al. "Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-don/)

BibTeX

@inproceedings{zhao2026iclr-don,
  title     = {{Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning}},
  author    = {Zhao, Xingyi and Xie, Tian and Qi, Xiaojun and Xu, Depeng and Yuan, Shuhan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhao2026iclr-don/}
}