FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs
Abstract
Training large language models (LLMs) is a costly endeavour in terms of time and computational resources. The large amount of training data used during the unsupervised pre-training phase makes it difficult to verify all data and, unfortunately, undesirable data may be ingested during training. Re-training from scratch is impractical and has led to the creation of the \textit{unlearning} discipline where models are modified to ``unlearn" undesirable information without retraining. However, any modification can alter the behaviour of LLMs, especially on key dimensions such as \textit{fairness}. This is the first work that examines this interplay between unlearning and fairness for LLMs. In particular, we focus on a popular unlearning framework known as SISA [Bourtoule et al., 2021], which creates an ensemble of models trained on disjoint shards. We evaluate the performance-fairness trade-off for SISA, and empirically demsontrate that SISA can indeed reduce fairness in LLMs. To remedy this, we propose post-processing bias mitigation techniques for ensemble models produced by SISA. Through experimental results, we demonstrate the efficacy of our post-processing framework called \textit{FairSISA}.
Cite
Text
Kadhe et al. "FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs." NeurIPS 2023 Workshops: SoLaR, 2023.Markdown
[Kadhe et al. "FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs." NeurIPS 2023 Workshops: SoLaR, 2023.](https://mlanthology.org/neuripsw/2023/kadhe2023neuripsw-fairsisa/)BibTeX
@inproceedings{kadhe2023neuripsw-fairsisa,
title = {{FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs}},
author = {Kadhe, Swanand and Halimi, Anisa and Rawat, Ambrish and Baracaldo, Nathalie},
booktitle = {NeurIPS 2023 Workshops: SoLaR},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/kadhe2023neuripsw-fairsisa/}
}