Let the LLM Stick to Its Strengths: Learning to Route Economical LLM
Abstract
Recently, test-time scaling of Large Language Models (LLMs) has emerged as a practical alternative to parameter and data scaling. Reasoning tasks often require large-scale, RLVR-based LLMs, while more economical LLMs can handle simpler tasks. Routing an LLM tailored to *suitability* (*i.e.*, capability and cost) ensures usability and efficiency. We introduce LLMRec, which routes the most suitable LLM to the user query without pre-inference on the candidate LLM zoo. It pioneeringly reframes the LLM routing problem as a comprehensive recommendation system (RecSys) task. Our core insight is that an LLM's suitability for a query is a complex, latent signal equal to user-item preference. LLMRec systematically engineers features for candidate LLMs (intrinsic attributes and capability distributions), queries (general semantics and meta-dimensional info), and context (inference type, cost budgets). It also incorporates behavioral features to learn high-order interactions. LLMRec is designed to generalize to out-of-domain datasets and adapt to new LLMs as the model zoo evolves. We define the metric with the Pareto frontier under user-specified cost budgets. Across six datasets, LLMRec achieves an average cost reduction of over 38% while maintaining accuracy and consistently outperforming baselines in converging toward the Pareto frontier.
Cite
Text
Zhang et al. "Let the LLM Stick to Its Strengths: Learning to Route Economical LLM." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhang et al. "Let the LLM Stick to Its Strengths: Learning to Route Economical LLM." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-let/)BibTeX
@inproceedings{zhang2025neurips-let,
title = {{Let the LLM Stick to Its Strengths: Learning to Route Economical LLM}},
author = {Zhang, Yi-Kai and Lu, Shiyin and Chen, Qing-Guo and Luo, Weihua and Zhan, De-Chuan and Ye, Han-Jia},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhang2025neurips-let/}
}