Adversarial Attacks on the Interpretation of Neuron Activation Maximization

Nanfack, Géraldin; Fulleringer, Alexander; Marty, Jonathan; Eickenberg, Michael; Belilovsky, Eugene

doi:10.1609/AAAI.V38I5.28228

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

Géraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael Eickenberg, Eugene Belilovsky

AAAI 2024 pp. 4315-4324

doi:10.1609/AAAI.V38I5.28228 /aaai/2024/nanfack2024aaai-adversarial/

Abstract

Feature visualization is one of the most popular techniques used to interpret the internal behavior of individual units of trained deep neural networks. Based on activation maximization, they consist of finding synthetic or natural inputs that maximize neuron activations. This paper introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation. It consists of finetuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization. We provide evidence of the success of this manipulation on several pre-trained models for the classification task with ImageNet.

PDF AAAI Semantic Scholar

Cite

Text

Nanfack et al. "Adversarial Attacks on the Interpretation of Neuron Activation Maximization." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I5.28228

Markdown

[Nanfack et al. "Adversarial Attacks on the Interpretation of Neuron Activation Maximization." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/nanfack2024aaai-adversarial/) doi:10.1609/AAAI.V38I5.28228

BibTeX

@inproceedings{nanfack2024aaai-adversarial,
  title     = {{Adversarial Attacks on the Interpretation of Neuron Activation Maximization}},
  author    = {Nanfack, Géraldin and Fulleringer, Alexander and Marty, Jonathan and Eickenberg, Michael and Belilovsky, Eugene},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {4315-4324},
  doi       = {10.1609/AAAI.V38I5.28228},
  url       = {https://mlanthology.org/aaai/2024/nanfack2024aaai-adversarial/}
}