Effective Targeted Attacks for Adversarial Self-Supervised Learning

Kim, Minseon; Ha, Hyeonjeong; Son, Sooel; Hwang, Sung Ju

Effective Targeted Attacks for Adversarial Self-Supervised Learning

Minseon Kim, Hyeonjeong Ha, Sooel Son, Sung Ju Hwang

NeurIPS 2023

/neurips/2023/kim2023neurips-effective/

Abstract

Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Kim et al. "Effective Targeted Attacks for Adversarial Self-Supervised Learning." Neural Information Processing Systems, 2023.

Markdown

[Kim et al. "Effective Targeted Attacks for Adversarial Self-Supervised Learning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/kim2023neurips-effective/)

BibTeX

@inproceedings{kim2023neurips-effective,
  title     = {{Effective Targeted Attacks for Adversarial Self-Supervised Learning}},
  author    = {Kim, Minseon and Ha, Hyeonjeong and Son, Sooel and Hwang, Sung Ju},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/kim2023neurips-effective/}
}