SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

Liu, Liangxin; Liu, Xuebo; Wong, Derek F.; Li, Dongfang; Wang, Ziyi; Hu, Baotian; Zhang, Min

doi:10.52202/079017-3102

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

NeurIPS 2024

doi:10.52202/079017-3102 /neurips/2024/liu2024neurips-selectit/

Abstract

Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed $\textit{SelectIT}$, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the $\textit{Selective Alpaca}$, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Liu et al. "SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection." Neural Information Processing Systems, 2024. doi:10.52202/079017-3102

Markdown

[Liu et al. "SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/liu2024neurips-selectit/) doi:10.52202/079017-3102

BibTeX

@inproceedings{liu2024neurips-selectit,
  title     = {{SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection}},
  author    = {Liu, Liangxin and Liu, Xuebo and Wong, Derek F. and Li, Dongfang and Wang, Ziyi and Hu, Baotian and Zhang, Min},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3102},
  url       = {https://mlanthology.org/neurips/2024/liu2024neurips-selectit/}
}