Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Abstract
We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core ``skills'' for instruction-following, either from existing datasets (Didolkar et al., 2024), or by directly prompting the model; (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just $4$K examples, LLaMA-3-8B-Base achieves 42.76\% length-controlled win rate on AlpacaEval 2.0, a level similar to frontier models like Claude 3 Opus and LLaMA-3.1-405B-Instruct.
Cite
Text
Kaur et al. "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning." NeurIPS 2024 Workshops: Compositional_Learning, 2024.Markdown
[Kaur et al. "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning." NeurIPS 2024 Workshops: Compositional_Learning, 2024.](https://mlanthology.org/neuripsw/2024/kaur2024neuripsw-instructskillmix/)BibTeX
@inproceedings{kaur2024neuripsw-instructskillmix,
title = {{Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning}},
author = {Kaur, Simran and Park, Simon and Goyal, Anirudh and Arora, Sanjeev},
booktitle = {NeurIPS 2024 Workshops: Compositional_Learning},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/kaur2024neuripsw-instructskillmix/}
}