Chained Tuning Leads to Biased Forgetting
Abstract
Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order. Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting, and conduct a systematic evaluation of the effects of several fine-tuning methods and hyper-parameters on forgetting. We hope our findings can better inform methods for chaining the fine-tuning of LLMs in continual learning settings to enable training of safer and less toxic models.
Cite
Text
Ung et al. "Chained Tuning Leads to Biased Forgetting." ICML 2024 Workshops: NextGenAISafety, 2024.Markdown
[Ung et al. "Chained Tuning Leads to Biased Forgetting." ICML 2024 Workshops: NextGenAISafety, 2024.](https://mlanthology.org/icmlw/2024/ung2024icmlw-chained/)BibTeX
@inproceedings{ung2024icmlw-chained,
title = {{Chained Tuning Leads to Biased Forgetting}},
author = {Ung, Megan and Sun, Alicia Yi and Bell, Samuel and Sagun, Levent and Williams, Adina},
booktitle = {ICML 2024 Workshops: NextGenAISafety},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/ung2024icmlw-chained/}
}