Mathesis: Towards Formal Theorem Proving from Natural Languages
Abstract
Recent advances in large language models (LLMs) show strong promise for formal reasoning. However, most LLM-based theorem provers remain constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We address this gap by focusing on autoformalization, the task of translating informal problems into formal statements. We propose Mathesis, the first pipeline for the systematic study of formal theorem proving from natural language. It contributes the first autoformalizer trained with reinforcement learning, which integrates syntactic, semantic, and prover feedback as reward signals to yield accurate and verifiable formalizations. This is further supported by our novel LeanScorer framework for evaluating semantic correctness. To assess real-world applicability, we introduce Gaokao-Formal, a benchmark of 495 complex proof problems from the college entrance exams. Experiments demonstrate that our autoformalizer improves pass rates by 45% on Gaokao-Formal and 6% on MiniF2F compared to state-of-the-art baselines. Paired with provers, our autoformalizer consistently enhances proving accuracy, including a 42% gain for DeepSeek-Prover-V2 on Gaokao-Formal. Our code is available at https://github.com/Huawei-AI4Math/Mathesis.
Cite
Text
Xuejun et al. "Mathesis: Towards Formal Theorem Proving from Natural Languages." International Conference on Learning Representations, 2026.Markdown
[Xuejun et al. "Mathesis: Towards Formal Theorem Proving from Natural Languages." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xuejun2026iclr-mathesis/)BibTeX
@inproceedings{xuejun2026iclr-mathesis,
title = {{Mathesis: Towards Formal Theorem Proving from Natural Languages}},
author = {Xuejun, Yu and Zhong, Jianyuan and Feng, Zijin and Zhai, Pengyi and Yousefzadeh, Roozbeh and Ng, Wei Chong and Liu, Haoxiong and Shou, Ziyi and Xiong, Jing and Zhou, Yudong and Ong, Claudia Beth and Sugiarto, Austen Jeremy and Zhang, Yaoxi and Tai, Wai Ming and Cao, Huan and Lu, Dongcai and Sun, Jiacheng and Xu, Qiang and Xin, Shen and Li, Zhenguo},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/xuejun2026iclr-mathesis/}
}