A Case Study of Instruction Tuning with Mixture of Parameter-Efficient Experts
Abstract
We study the applicability of mixture of parameter-efficient experts (MoPEs) for instruction-tuning large decoder-only language models. Recent literature indicates that MoPEs might enhance performance in specific multi-task instruction-following datasets. In this paper, we extend such previous results and study applicability of MoPEs in settings previously overlooked: a) with open-domain instruction-following datasets; b) with recent decoder-only models and c) with downstream out-of-distribution test sets. We build on top of LLaMA1-13B/-7B and LLaMA2-13B. We study different variants of learned routing, namely per-example routing ([PE]), and a more expensive per-token ([PT]) routing. Overall, we are unable to substantiate strong performance gains observed in related studies in our setting. We observe occasional enhancements of LLAMA2 fine-tuned on Open Platypus dataset in 0-shot SNI evaluation and TruthfulQA evaluation after fine-tuning on a subset of Flan. We shed some light on the inner workings of MoPEs by comparing different routing strategies. We find that [PE] routing tends to collapse at downstream evaluation time reducing the importance of router's application. We plan to publicly release our code.
Cite
Text
Ostapenko et al. "A Case Study of Instruction Tuning with Mixture of Parameter-Efficient Experts." NeurIPS 2023 Workshops: Instruction, 2023.Markdown
[Ostapenko et al. "A Case Study of Instruction Tuning with Mixture of Parameter-Efficient Experts." NeurIPS 2023 Workshops: Instruction, 2023.](https://mlanthology.org/neuripsw/2023/ostapenko2023neuripsw-case/)BibTeX
@inproceedings{ostapenko2023neuripsw-case,
title = {{A Case Study of Instruction Tuning with Mixture of Parameter-Efficient Experts}},
author = {Ostapenko, Oleksiy and Caccia, Lucas and Su, Zhan and Le Roux, Nicolas and Charlin, Laurent and Sordoni, Alessandro},
booktitle = {NeurIPS 2023 Workshops: Instruction},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/ostapenko2023neuripsw-case/}
}