Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
Abstract
Backdoor attacks pose a significant threat to machine learning models, allowing adversaries to implant hidden triggers that alter model behavior when activated. Although gradient ascent (GA)-based unlearning has been proposed as an efficient backdoor removal approach, we identify a critical yet overlooked issue: vanilla GA does not eliminate the trigger but shifts its impact to different classes, a phenomenon we call trigger shifting. To address this, we propose Robust Gradient Ascent (RGA), which introduces a dynamic penalty mechanism to regulate GA's strength and prevent excessive unlearning. Our experiments show that RGA effectively removes backdoors while preserving model utility, offering a more reliable defense against backdoor attacks.
Cite
Text
Zhao et al. "Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning." International Conference on Learning Representations, 2026.Markdown
[Zhao et al. "Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-don/)BibTeX
@inproceedings{zhao2026iclr-don,
title = {{Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning}},
author = {Zhao, Xingyi and Xie, Tian and Qi, Xiaojun and Xu, Depeng and Yuan, Shuhan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhao2026iclr-don/}
}