Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks

Abstract

We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.

Cite

Text

Di et al. "Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks." NeurIPS 2022 Workshops: TSRML, 2022.

Markdown

[Di et al. "Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks." NeurIPS 2022 Workshops: TSRML, 2022.](https://mlanthology.org/neuripsw/2022/di2022neuripsw-hidden-a/)

BibTeX

@inproceedings{di2022neuripsw-hidden-a,
  title     = {{Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks}},
  author    = {Di, Jimmy Z. and Douglas, Jack and Acharya, Jayadev and Kamath, Gautam and Sekhari, Ayush},
  booktitle = {NeurIPS 2022 Workshops: TSRML},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/di2022neuripsw-hidden-a/}
}