Prototype Guided Backdoor Defense via Activation Space Manipulation
Abstract
Deep learning models are susceptible to backdoor attacks involving malicious perturbation of some training data with a trigger to force misclassification to a target class. Various triggers have been used including semantic triggers that are easily realizable. We present Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers. PPGBD exploits displacements in the geometric spaces of activations to penalize movements towards the trigger. This is done using a novel sanitization loss of a post-hoc fine-tuning step. This approach scales to all types of attacks and triggers, and achieves better performance across settings. We also present the first defense against semantic attacks on a new celebrity face images dataset. Activation spaces can provide rich clues to enhance DL models in different ways.
Cite
Text
Amula et al. "Prototype Guided Backdoor Defense via Activation Space Manipulation." International Conference on Computer Vision, 2025.Markdown
[Amula et al. "Prototype Guided Backdoor Defense via Activation Space Manipulation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/amula2025iccv-prototype/)BibTeX
@inproceedings{amula2025iccv-prototype,
title = {{Prototype Guided Backdoor Defense via Activation Space Manipulation}},
author = {Amula, Venkat Adithya and Samavedam, Sunayana and Saini, Saurabh and Gupta, Avani and Narayanan, P J},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {2195-2205},
url = {https://mlanthology.org/iccv/2025/amula2025iccv-prototype/}
}