MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

Abstract

Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present **MCP-SafetyBench**, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains—browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs, revealing that all models remain vulnerable to MCP attacks, with a notable safety-utility trade-off. Our results highlight the urgent need for stronger defenses and establish MCP-SafetyBench as a foundation for diagnosing and mitigating safety risks in real-world MCP deployments. Our benchmark is available at https://github.com/xjzzzzzzzz/MCPSafety.

Cite

Text

Zong et al. "MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers." International Conference on Learning Representations, 2026.

Markdown

[Zong et al. "MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zong2026iclr-mcpsafetybench/)

BibTeX

@inproceedings{zong2026iclr-mcpsafetybench,
  title     = {{MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers}},
  author    = {Zong, Xuanjun and Shen, Zhiqi and Wang, Lei and Lan, Yunshi and Yang, Chao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zong2026iclr-mcpsafetybench/}
}