Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Abstract

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.

Cite

Text

Cheng et al. "Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I2.16201

Markdown

[Cheng et al. "Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/cheng2021aaai-deep/) doi:10.1609/AAAI.V35I2.16201

BibTeX

@inproceedings{cheng2021aaai-deep,
  title     = {{Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification}},
  author    = {Cheng, Siyuan and Liu, Yingqi and Ma, Shiqing and Zhang, Xiangyu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {1148-1156},
  doi       = {10.1609/AAAI.V35I2.16201},
  url       = {https://mlanthology.org/aaai/2021/cheng2021aaai-deep/}
}