With False Friends like These, Who Can Notice Mistakes?

Abstract

Adversarial examples crafted by an explicit adversary have attracted significant attention in machine learning. However, the security risk posed by a potential false friend has been largely overlooked. In this paper, we unveil the threat of hypocritical examples---inputs that are originally misclassified yet perturbed by a false friend to force correct predictions. While such perturbed examples seem harmless, we point out for the first time that they could be maliciously used to conceal the mistakes of a substandard (i.e., not as good as required) model during an evaluation. Once a deployer trusts the hypocritical performance and applies the "well-performed" model in real-world applications, unexpected failures may happen even in benign environments. More seriously, this security risk seems to be pervasive: we find that many types of substandard models are vulnerable to hypocritical examples across multiple datasets. Furthermore, we provide the first attempt to characterize the threat with a metric called hypocritical risk and try to circumvent it via several countermeasures. Results demonstrate the effectiveness of the countermeasures, while the risk remains non-negligible even after adaptive robust training.

Cite

Text

Tao et al. "With False Friends like These, Who Can Notice Mistakes?." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I8.20822

Markdown

[Tao et al. "With False Friends like These, Who Can Notice Mistakes?." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/tao2022aaai-false/) doi:10.1609/AAAI.V36I8.20822

BibTeX

@inproceedings{tao2022aaai-false,
  title     = {{With False Friends like These, Who Can Notice Mistakes?}},
  author    = {Tao, Lue and Feng, Lei and Yi, Jinfeng and Chen, Songcan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {8458-8466},
  doi       = {10.1609/AAAI.V36I8.20822},
  url       = {https://mlanthology.org/aaai/2022/tao2022aaai-false/}
}