Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Abstract
Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we demonstrate that using self-generated datasets consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 3 1 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3× larger than GPT3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to unlearn the toxic content seen at pretraining. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for large-scale models. Our code will be available at: https://github.com/NVIDIA/Megatron-LM/.
Cite
Text
Wang et al. "Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models." Neural Information Processing Systems, 2022.Markdown
[Wang et al. "Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/wang2022neurips-exploring/)BibTeX
@inproceedings{wang2022neurips-exploring,
title = {{Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models}},
author = {Wang, Boxin and Ping, Wei and Xiao, Chaowei and Xu, Peng and Patwary, Mostofa and Shoeybi, Mohammad and Li, Bo and Anandkumar, Anima and Catanzaro, Bryan},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/wang2022neurips-exploring/}
}