From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space

Dreyer, Maximilian; Pahde, Frederik; Anders, Christopher J.; Samek, Wojciech; Lapuschkin, Sebastian

doi:10.1609/AAAI.V38I19.30096

From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space

Maximilian Dreyer, Frederik Pahde, Christopher J. Anders, Wojciech Samek, Sebastian Lapuschkin

AAAI 2024 pp. 21046-21054

doi:10.1609/AAAI.V38I19.30096 /aaai/2024/dreyer2024aaai-hope/

Abstract

Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method for model correction on the concept level that explicitly reduces model sensitivity towards biases via gradient penalization. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures. Code and Appendix are available on https://github.com/frederikpahde/rrclarc.

PDF AAAI Semantic Scholar

Cite

Text

Dreyer et al. "From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30096

Markdown

[Dreyer et al. "From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/dreyer2024aaai-hope/) doi:10.1609/AAAI.V38I19.30096

BibTeX

@inproceedings{dreyer2024aaai-hope,
  title     = {{From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space}},
  author    = {Dreyer, Maximilian and Pahde, Frederik and Anders, Christopher J. and Samek, Wojciech and Lapuschkin, Sebastian},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {21046-21054},
  doi       = {10.1609/AAAI.V38I19.30096},
  url       = {https://mlanthology.org/aaai/2024/dreyer2024aaai-hope/}
}