RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
Abstract
Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model, which consists of an encoder and LLM embeddings, is trained by two proposed contrastive losses (sample-LLM and sample-sample losses). Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. The source code is available at https://github.com/shuhao02/RouterDC.
Cite
Text
Chen et al. "RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2120Markdown
[Chen et al. "RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/chen2024neurips-routerdc/) doi:10.52202/079017-2120BibTeX
@inproceedings{chen2024neurips-routerdc,
title = {{RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models}},
author = {Chen, Shuhao and Jiang, Weisen and Lin, Baijiong and Kwok, James T. and Zhang, Yu},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-2120},
url = {https://mlanthology.org/neurips/2024/chen2024neurips-routerdc/}
}