MS-Bench: Evaluating LMMs in Ancient Manuscript Study Through a Dunhuang Case Study
Abstract
Analyzing ancient manuscripts has traditionally been a labor-intensive and time-consuming task for philologists. While recent advancements in LMMs have demonstrated their potential across diverse domains, their effectiveness in manuscript study remains underexplored. In this paper, we introduce MS-Bench, the first comprehensive benchmark co-developed with archaeologists, comprising 5,076 high-resolution images from 4th to 14th century and 9,982 expert-curated questions across nine sub-tasks aligned with archaeological workflows. Through four prompting strategies, we systematically evaluate 32 LMMs on their effectiveness, robustness, and cultural contextualization. Our analysis reveals scale-driven performance and reliability improvements, prompting strategies' impact on performance (CoT has two-sides effect, while visual retrieval-augmented prompts provide consistent boost), and task-specific preferences depending on LMM’s visual capabilities. Although current LMMs are not yet capable of replacing domain expertise, they demonstrate promising potential to accelerate manuscript research through future human–AI collaboration.
Cite
Text
Zhang et al. "MS-Bench: Evaluating LMMs in Ancient Manuscript Study Through a Dunhuang Case Study." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhang et al. "MS-Bench: Evaluating LMMs in Ancient Manuscript Study Through a Dunhuang Case Study." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-msbench/)BibTeX
@inproceedings{zhang2025neurips-msbench,
title = {{MS-Bench: Evaluating LMMs in Ancient Manuscript Study Through a Dunhuang Case Study}},
author = {Zhang, Yuqing and Han, Yue and Zhu, Shuanghe and Wu, Haoxiang and Li, Hangqi and Zhang, Shengyu and Yan, Junchi and Liu, Zemin and Kuang, Kun and Dou, Huaiyong and Zhang, Yongquan and Wu, Fei},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhang2025neurips-msbench/}
}