MTU-Bench: A Multi-Granularity Tool-Use Benchmark for Large Language Models

Abstract

Large Language Models (LLMs) have displayed massive improvements in reason- ing and decision-making skills and can hold natural conversations with users. Recently, many tool-use benchmark datasets have been proposed. However, existing datasets have the following limitations: (1). Insufficient evaluation scenarios (e.g., only cover limited tool-use scenes). (2). Extensive evaluation costs (e.g., GPT API costs). To address these limitations, in this work, we propose a multi-granularity tool-use benchmark for large language models called MTU-Bench. For the "multi-granularity" property, our MTU-Bench covers five tool usage scenes (i.e., single-turn and single-tool, single-turn and multiple-tool, multiple-turn and single-tool, multiple-turn and multiple-tool, and out-of-distribution tasks). Besides, all evaluation metrics of our MTU-Bench are based on the prediction results and the ground truth without using any GPT or human evaluation metrics. Moreover, our MTU-Bench is collected by transforming existing high-quality datasets to simulate real-world tool usage scenarios, and we also propose an instruction dataset called MTU-Instruct data to enhance the tool-use abilities of existing LLMs. Comprehensive experimental results demonstrate the effectiveness of our MTU-Bench.

Cite

Text

Wang et al. "MTU-Bench: A Multi-Granularity Tool-Use Benchmark for Large Language Models." International Conference on Learning Representations, 2025.

Markdown

[Wang et al. "MTU-Bench: A Multi-Granularity Tool-Use Benchmark for Large Language Models." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/wang2025iclr-mtubench/)

BibTeX

@inproceedings{wang2025iclr-mtubench,
  title     = {{MTU-Bench: A Multi-Granularity Tool-Use Benchmark for Large Language Models}},
  author    = {Wang, Pei and Wu, Yanan and Wang, Noah and Liu, Jiaheng and Song, Xiaoshuai and Peng, Z.Y. and Deng, Ken and Zhang, Chenchen and JiakaiWang,  and Peng, Junran and Zhang, Ge and Guo, Hangyu and Zhang, Zhaoxiang and Su, Wenbo and Zheng, Bo},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/wang2025iclr-mtubench/}
}