Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Abstract
Modern engineering, spanning electrical, mechanical, aerospace, civil, and computer disciplines, stands as a cornerstone of human civilization and the foundation of our society. However, engineering design poses a fundamentally different challenge for large language models (LLMs) compared with traditional textbook-style problem solving or factual question answering. Although existing benchmarks have driven progress in areas such as language understanding, code synthesis, and scientific problem solving, real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers' time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce EngDesign, an Engineering Design benchmark that evaluates LLMs' abilities to perform practical design tasks across nine engineering domains. Unlike existing benchmarks that focus on factual recall or question answering, EngDesign uniquely emphasizes LLMs' ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented engineering designs. Each task in EngDesign represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. EngDesign pioneers a simulation-based evaluation paradigm that moves beyond textbook knowledge to assess genuine engineering design capabilities and shifts evaluation from static answer checking to dynamic, simulation-driven functional verification, marking a crucial step toward realizing the vision of engineering Artificial General Intelligence (AGI).
Cite
Text
Guo et al. "Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs." Advances in Neural Information Processing Systems, 2025.Markdown
[Guo et al. "Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/guo2025neurips-engineering/)BibTeX
@inproceedings{guo2025neurips-engineering,
title = {{Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs}},
author = {Guo, Xingang and Li, Yaxin and Kong, XiangYi and Jiang, Yilan and Zhao, Xiayu and Gong, Zhihua and Zhang, Yufan and Li, Daixuan and Sang, Tianle and Zhu, Beixiao and Jun, Gregory and Huang, Yingbing and Liu, Yiqi and Xue, Yuqi and Kundu, Rahul Dev and Lim, Qi Jian and Zhao, Yizhou and Granger, Luke Alexander and Younis, Mohamed Badr and Keivan, Darioush and Sabharwal, Nippun and Sinha, Shreyanka and Agarwal, Prakhar and Vandyck, Kojo and Mai, Hanlin and Wang, Zichen and Venkatesh, Aditya and Barik, Ayush and Yang, Jiankun and Yue, Chongying and He, Jingjie and Wang, Libin and Xu, Licheng and Chen, Hao and Wang, Jinwen and Xu, Liujun and Shetty, Rushabh and Guo, Ziheng and Song, Dahui and Jha, Manvi and Liang, Weijie and Yan, Weiman and Zhang, Bryan and Karnoor, Sahil Bhandary and Zhang, Jialiang and Pandya, Rutva and Gong, Xinyi and Ganesh, Mithesh Ballae and Shi, Feize and Xu, Ruiling and Zhang, Yifan and Ouyang, Yanfeng and Qin, Lianhui and Rosenbaum, Elyse and Snyder, Corey and Seiler, Peter and Dullerud, Geir and Zhang, Xiaojia Shelly and Cheng, Zuofu and Hanumolu, Pavan Kumar and Huang, Jian and Kulkarni, Mayank and Namazifar, Mahdi and Zhang, Huan and Hu, Bin},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/guo2025neurips-engineering/}
}