AUTE: Peer-Alignment and Self-Unlearning Boost Adversarial Robustness for Training Ensemble Models

Abstract

Adversarial attacks poses a significant threat to the security of AI-based systems. To counteract these attacks, adversarial training (AT) and ensemble learning (EL) have emerged as widely adopted methods for enhancing model robustness. However, a counter-intuitive phenomenon arises where the simple combination of these approaches may potentially compromising adversarial robustness of ensemble models. In this paper, we propose a novel method called Alignment and Unlearning for Training Ensembles (AUTE), aiming to effectively integrate AT and EL to maximize their benefits. Specifically, AUTE incorporates two key components. Firstly, AUTE divides the ensemble into a big peer model and a single member in a loop manner, aligning their outputs for boosting robustness of each member. Secondly, AUTE introduces the concept of unlearning, actively forgetting specific data with over-confident properties to preserve model capacity to learn more robust features. Extensive experiments across various datasets and networks illustrate that AUTE achieves superior performance compared to baselines. For instance, a 5-member AUTE with ResNet-20 networks outperforms state-of-the-art method by 2.1% and 3.2% in classifying clean and adversarial data. Additionally, AUTE can easily extend to non-adversarial training paradigm, surpassing current standard ensemble learning methods by a large margin.

Cite

Text

Huang et al. "AUTE: Peer-Alignment and Self-Unlearning Boost Adversarial Robustness for Training Ensemble Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I4.32382

Markdown

[Huang et al. "AUTE: Peer-Alignment and Self-Unlearning Boost Adversarial Robustness for Training Ensemble Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/huang2025aaai-aute/) doi:10.1609/AAAI.V39I4.32382

BibTeX

@inproceedings{huang2025aaai-aute,
  title     = {{AUTE: Peer-Alignment and Self-Unlearning Boost Adversarial Robustness for Training Ensemble Models}},
  author    = {Huang, Lifeng and Su, Tian and Gao, Chengying and Liu, Ning and Huang, Qiong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {3671-3679},
  doi       = {10.1609/AAAI.V39I4.32382},
  url       = {https://mlanthology.org/aaai/2025/huang2025aaai-aute/}
}