MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Alshammari, Shaden; Wen, Kevin; Zainal, Abrar; Hamilton, Mark; Safaei, Navid; Albarakati, Sultan; Freeman, William T.; Torralba, Antonio

MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

ICLR 2026

/iclr/2026/alshammari2026iclr-mathnet/

Abstract

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce **MathNet**, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. **MathNet** spans 47 countries, 16 languages, and two decades of competitions, comprising **30,676 expert-authored problems with solutions** across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. **MathNet** supports three tasks: (i) mathematical problem solving, (ii) problem retrieval, and (iii) retrieval-augmented problem solving (math RAG). Experimental results show that even state-of-the-art reasoning models (**78.4% for `Gemini-3.1-Pro` and 69.3% for `GPT-5`**) remain challenged, while embedding models struggle to retrieve equivalent problems. We further show that RAG performance is highly sensitive to retrieval quality; for example, `DeepSeek-V3.2-Speciale` achieves gains of up to **12%**, obtaining the highest scores on the benchmark. **MathNet** provides the largest high-quality Olympiad dataset together with the first benchmark for evaluating mathematical problem retrieval, and we publicly release both the dataset and benchmark at [https://mathnet.mit.edu](https://mathnet.csail.mit.edu).

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Alshammari et al. "MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval." International Conference on Learning Representations, 2026.

Markdown

[Alshammari et al. "MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/alshammari2026iclr-mathnet/)

BibTeX

@inproceedings{alshammari2026iclr-mathnet,
  title     = {{MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval}},
  author    = {Alshammari, Shaden and Wen, Kevin and Zainal, Abrar and Hamilton, Mark and Safaei, Navid and Albarakati, Sultan and Freeman, William T. and Torralba, Antonio},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/alshammari2026iclr-mathnet/}
}