Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models

Byun, SungJoo; Jang, Dongjun; Jo, Hyemi; Shin, Hyopil

Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models

SungJoo Byun, Dongjun Jang, Hyemi Jo, Hyopil Shin

NeurIPSW 2023

/neuripsw/2023/byun2023neuripsw-automatic/

Abstract

$\textit{\textbf{Caution}: this paper may include material that could be offensive or distressing.} $ The advent of Large Language Models (LLMs) necessitates the development of training approaches that mitigate the generation of unethical language and aptly manage toxic user queries. Given the challenges related to human labor and the scarcity of data, we present KoTox, comprising 39K unethical instruction-output pairs. This collection of automatically generated toxic instructions refines the training of LLMs and establishes a foundational framework for improving LLMs' ethical awareness and response to various toxic inputs, promoting more secure and responsible interactions in Natural Language Processing (NLP) applications.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Byun et al. "Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models." NeurIPS 2023 Workshops: Instruction, 2023.

Markdown

[Byun et al. "Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models." NeurIPS 2023 Workshops: Instruction, 2023.](https://mlanthology.org/neuripsw/2023/byun2023neuripsw-automatic/)

BibTeX

@inproceedings{byun2023neuripsw-automatic,
  title     = {{Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models}},
  author    = {Byun, SungJoo and Jang, Dongjun and Jo, Hyemi and Shin, Hyopil},
  booktitle = {NeurIPS 2023 Workshops: Instruction},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/byun2023neuripsw-automatic/}
}