APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets
Abstract
The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, improving its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset and models are available on the project homepage \url{https://apigen-pipeline.github.io/}.
Cite
Text
Liu et al. "APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets." Neural Information Processing Systems, 2024. doi:10.52202/079017-1725Markdown
[Liu et al. "APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/liu2024neurips-apigen/) doi:10.52202/079017-1725BibTeX
@inproceedings{liu2024neurips-apigen,
title = {{APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets}},
author = {Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and Murthy, Rithesh and Yang, Liangwei and Savarese, Silvio and Niebles, Juan Carlos and Wang, Huan and Heinecke, Shelby and Xiong, Caiming},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1725},
url = {https://mlanthology.org/neurips/2024/liu2024neurips-apigen/}
}