FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Abstract
The rapid adoption of large language models (LLMs) has led to a growing number of companies offering generative LLMs as callable services at varying costs. We find that popular generative LLM APIs, such as GPT-4, Gemini 1.5, and Claude 3.5, exhibit heterogeneous pricing structures, with fees that can differ by two orders of magnitude and heterogeneous performance across tasks and input queries. This makes it challenging for users to decide which generative LLM APIs to utilize for their applications and budget. Motivated by these findings, we propose FrugalGPT, an algorithmic framework that adaptively selects which generative LLMs to use for different queries to reduce cost and improve accuracy. Our experiments demonstrate that, for a range of natural language tasks including news classification, reading comprehension, and scientific question answering, FrugalGPT can match the performance of the best individual generative LLM (e.g., GPT-4) with up to a 98% cost reduction or improve the accuracy over GPT-4 by 4% at the same cost. The ideas and findings presented in this paper lay a foundation for using LLMs sustainably and efficiently.
Cite
Text
Chen et al. "FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance." Transactions on Machine Learning Research, 2024.Markdown
[Chen et al. "FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/chen2024tmlr-frugalgpt/)BibTeX
@article{chen2024tmlr-frugalgpt,
title = {{FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance}},
author = {Chen, Lingjiao and Zaharia, Matei and Zou, James},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/chen2024tmlr-frugalgpt/}
}