BagFlip: A Certified Defense Against Data Poisoning

Abstract

Machine learning models are vulnerable to data-poisoning attacks, in which an attacker maliciously modifies the training set to change the prediction of a learned model. In a trigger-less attack, the attacker can modify the training set but not the test inputs, while in a backdoor attack the attacker can also modify test inputs. Existing model-agnostic defense approaches either cannot handle backdoor attacks or do not provide effective certificates (i.e., a proof of a defense). We present BagFlip, a model-agnostic certified approach that can effectively defend against both trigger-less and backdoor attacks. We evaluate BagFlip on image classification and malware detection datasets. BagFlip is equal to or more effective than the state-of-the-art approaches for trigger-less attacks and more effective than the state-of-the-art approaches for backdoor attacks.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhang et al. "BagFlip: A Certified Defense Against Data Poisoning." Neural Information Processing Systems, 2022.

Markdown

[Zhang et al. "BagFlip: A Certified Defense Against Data Poisoning." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/zhang2022neurips-bagflip/)

BibTeX

@inproceedings{zhang2022neurips-bagflip,
  title     = {{BagFlip: A Certified Defense Against Data Poisoning}},
  author    = {Zhang, Yuhao and Albarghouthi, Aws and D'Antoni, Loris},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/zhang2022neurips-bagflip/}
}