Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora

ICLR 2025

/iclr/2025/kaur2025iclr-instructskillmix/

Abstract

We introduce INSTRUCT-SKILLMIX, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core “skills” for instruction-following by directly prompting the model. This is inspired by “LLM metacognition” of (Didolkar et al., 2024); (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. The estimated cost of creating the dataset is under $600. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from INSTRUCT-SKILLMIX leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just 4K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0, a level similar to frontier models like Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. In our dataset,adding 20% low quality answers (“shirkers”) causes a noticeable degradation in performance. The INSTRUCT-SKILLMIX pipeline seems flexible and adaptable to other settings.

PDF ICLR Semantic Scholar

Cite

Text

Kaur et al. "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning." International Conference on Learning Representations, 2025.

Markdown

[Kaur et al. "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kaur2025iclr-instructskillmix/)

BibTeX

@inproceedings{kaur2025iclr-instructskillmix,
  title     = {{Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning}},
  author    = {Kaur, Simran and Park, Simon and Goyal, Anirudh and Arora, Sanjeev},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kaur2025iclr-instructskillmix/}
}