Projected Language Models: A Large Model Pre-Segmented into Smaller Ones
Abstract
Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get a small language model with good specialized accuracy, even when specialization data is unknown during pretraining. We propose a novel architecture, projected networks (PN). PN is a high capacity network whose parameters can be linearly projected into a small network for fine tuning. We assess the empirical effectiveness of our solution compared to small model training, distillation and hard mixture of experts.
Cite
Text
Grangier et al. "Projected Language Models: A Large Model Pre-Segmented into Smaller Ones." ICML 2024 Workshops: FM-Wild, 2024.Markdown
[Grangier et al. "Projected Language Models: A Large Model Pre-Segmented into Smaller Ones." ICML 2024 Workshops: FM-Wild, 2024.](https://mlanthology.org/icmlw/2024/grangier2024icmlw-projected/)BibTeX
@inproceedings{grangier2024icmlw-projected,
title = {{Projected Language Models: A Large Model Pre-Segmented into Smaller Ones}},
author = {Grangier, David and Katharopoulos, Angelos and Ablin, Pierre and Hannun, Awni},
booktitle = {ICML 2024 Workshops: FM-Wild},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/grangier2024icmlw-projected/}
}