Cost-Aware Contrastive Routing for LLMs

Abstract

We study cost-aware routing for large language models across diverse and dynamic pools of models. Existing approaches often overlook prompt-specific context, rely on expensive model profiling, assume a fixed set of experts, or use inefficient trial-and-error strategies. We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared embedding space to enable fast, cost-sensitive selection. CSCR uses compact, fast-to-compute logit footprints for open-source models and perplexity fingerprints for black-box APIs. A contrastive encoder is trained to favor the cheapest accurate expert within adaptive cost bands. At inference time, routing reduces to a single $k$‑NN lookup via a FAISS index, requiring no retraining when the expert pool changes and enabling microsecond latency. Across multiple benchmarks, CSCR consistently outperforms baselines, improving the accuracy–cost tradeoff by up to 25\%, while generalizing robustly to unseen LLMs and out-of-distribution prompts.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Shirkavand et al. "Cost-Aware Contrastive Routing for LLMs." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shirkavand et al. "Cost-Aware Contrastive Routing for LLMs." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shirkavand2025neurips-costaware/)

BibTeX

@inproceedings{shirkavand2025neurips-costaware,
  title     = {{Cost-Aware Contrastive Routing for LLMs}},
  author    = {Shirkavand, Reza and Gao, Shangqian and Yu, Peiran and Huang, Heng},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shirkavand2025neurips-costaware/}
}