FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Abstract

We present FinMMR, a novel bilingual multimodal benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks. Compared to existing benchmarks, our work introduces three significant advancements. (1) Multimodality: We meticulously transform existing financial reasoning benchmarks, and construct novel questions from the latest Chinese financial research reports. FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts. (2) Comprehensiveness: FinMMR encompasses 14 financial subdomains, including corporate finance, banking, and industry analysis, significantly exceeding existing benchmarks in financial domain knowledge breadth. (3) Challenge: Models are required to perform multi-step precise numerical reasoning by integrating financial knowledge with the understanding of complex financial images and text. The best-performing MLLM achieves only 51.4% accuracy on Hard problems. We believe that FinMMR will drive advancements in enhancing the reasoning capabilities of MLLMs in real-world scenarios.

Cite

Text

Tang et al. "FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging." International Conference on Computer Vision, 2025.

Markdown

[Tang et al. "FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/tang2025iccv-finmmr/)

BibTeX

@inproceedings{tang2025iccv-finmmr,
  title     = {{FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging}},
  author    = {Tang, Zichen and E, Haihong and Liu, Jiacheng and Yang, Zhongjun and Li, Rongjin and Rong, Zihua and He, Haoyang and Hao, Zhuodi and Hu, Xinyang and Ji, Kun and Ma, Ziyan and Ji, Mengyuan and Zhang, Jun and Ma, Chenghao and Zheng, Qianhe and Liu, Yang and Huang, Yiling and Hu, Xinyi and Huang, Qing and Xie, Zijian and Peng, Shiyao},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {3245-3257},
  url       = {https://mlanthology.org/iccv/2025/tang2025iccv-finmmr/}
}