Universal LLM Routing with Correctness-Based Representation

Abstract

Large language models’ significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail an effective strategy relying on cluster-based routing. We prove that the strategy is an estimate of a theoretically optimal routing rule. Experiments on a range of public benchmarks show the effectiveness of the proposal in routing amongst more than 30 unseen LLMs.

Cite

Text

Jitkrittum et al. "Universal LLM Routing with Correctness-Based Representation." ICLR 2025 Workshops: SCOPE, 2025.

Markdown

[Jitkrittum et al. "Universal LLM Routing with Correctness-Based Representation." ICLR 2025 Workshops: SCOPE, 2025.](https://mlanthology.org/iclrw/2025/jitkrittum2025iclrw-universal/)

BibTeX

@inproceedings{jitkrittum2025iclrw-universal,
  title     = {{Universal LLM Routing with Correctness-Based Representation}},
  author    = {Jitkrittum, Wittawat and Narasimhan, Harikrishna and Rawat, Ankit Singh and Juneja, Jeevesh and Wang, Zifeng and Lee, Chen-Yu and Shenoy, Pradeep and Panigrahy, Rina and Menon, Aditya Krishna and Kumar, Sanjiv},
  booktitle = {ICLR 2025 Workshops: SCOPE},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/jitkrittum2025iclrw-universal/}
}