BinMetric: A Comprehensive Binary Code Analysis Benchmark for Large Language Models
Abstract
Binary analysis is crucial for software security, offering insights into compiled programs without source code. As large language models (LLMs) excel in language tasks, their potential for complex decoding binary data structures is growing. However, the lack of standardized benchmarks hinders their evaluation and progress in this domain. To bridge this gap, we introduce BinMetric, a first comprehensive benchmark designed specifically to evaluate LLMs performance on binary analysis tasks. BinMetric comprises 1,000 questions derived from 20 real-world open-source projects across 6 practical binary analysis tasks, including decompilation, code summarization, etc., which reflect actual reverse engineering scenarios. Our empirical study on this benchmark investigates various state-of-the-art LLMs, revealing their strengths and limitations. The findings indicate that while LLMs show strong potential, challenges still exist, particularly in the areas of precise binary lifting and assembly synthesis. In summary, BinMetric makes a significant step forward in measuring binary analysis capabilities of LLMs, establishing a new benchmark leaderboard, and our study offers valuable insights for advancing LLMs in software security.
Cite
Text
Shang et al. "BinMetric: A Comprehensive Binary Code Analysis Benchmark for Large Language Models." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/858Markdown
[Shang et al. "BinMetric: A Comprehensive Binary Code Analysis Benchmark for Large Language Models." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/shang2025ijcai-binmetric/) doi:10.24963/IJCAI.2025/858BibTeX
@inproceedings{shang2025ijcai-binmetric,
title = {{BinMetric: A Comprehensive Binary Code Analysis Benchmark for Large Language Models}},
author = {Shang, Xiuwei and Chen, Guoqiang and Cheng, Shaoyin and Wu, Benlong and Hu, Li and Li, Gangyang and Zhang, Weiming and Yu, Nenghai},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {7715-7723},
doi = {10.24963/IJCAI.2025/858},
url = {https://mlanthology.org/ijcai/2025/shang2025ijcai-binmetric/}
}