FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

Abstract

Large language models (LLMs) have recently demonstrated remarkable progress in formal theorem proving. Yet their ability to serve as practical assistants for mathematicians, filling in missing steps within complex proofs, remains underexplored. We identify this challenge as the task of subgoal completion, where an LLM must discharge short but nontrivial proof obligations left unresolved in a human-provided sketch. To study this problem, we introduce FormalML, a Lean 4 benchmark built from foundational theories of machine learning. Using a translation tactic that converts procedural proofs into declarative form, we extract 4,937 problems spanning optimization and probability inequalities, with varying levels of difficulty. FormalML is the first subgoal completion benchmark to combine premise retrieval and complex research-level contexts. Evaluation of state-of-the-art provers highlights persistent limitations in accuracy and efficiency, underscoring the need for more capable LLM-based theorem provers for effective subgoal completion.

Cite

Text

Yang et al. "FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory." International Conference on Learning Representations, 2026.

Markdown

[Yang et al. "FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yang2026iclr-formalml/)

BibTeX

@inproceedings{yang2026iclr-formalml,
  title     = {{FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory}},
  author    = {Yang, Xiao-Wen and Zhang, Zihao and Cao, Jianuo and Zhou, Zhi and Li, Zenan and Guo, Lan-Zhe and Yao, Yuan and Chen, Taolue and Li, Yu-Feng and Ma, Xiaoxing},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/yang2026iclr-formalml/}
}