Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities

Abstract

LLM-based formal proof assistants (e.g., in Lean) hold great promise for automating mathematical discovery. But beyond syntactic correctness, do these systems truly understand mathematical structure as humans do? We investigate this question in context of mathematical inequalities---specifically the prover's ability to recognize that the given problem simplifies by applying a known inequality such as AM/GM. Specifically, we are interested in their ability to do this in a {\em compositional setting} where multiple inequalities must be applied as part of a solution. We introduce \ineqcomp, a benchmark built from elementary inequalities through systematic transformations, including variable duplication, algebraic rewriting, and multi-step composition. Although these problems remain easy for humans, we find that most provers---including Goedel, STP, and Kimina-7B---struggle significantly. DeepSeek-Prover-V2-7B shows relative robustness, but still suffers a 20\% performance drop (pass@32). Even for DeepSeek-Prover-V2-671B model, the gap between compositional variants and seed problems exists, implying that simply scaling up the model size alone does not fully solve the compositional weakness. Strikingly, performance remains poor for all models even when formal proofs of the constituent parts are provided in context, revealing that the source of weakness is indeed in compositional reasoning. Our results expose a persisting gap between the generalization behavior of current AI provers and human mathematical intuition. All data and evaluation code can be found at \url{https://github.com/haoyuzhao123/LeanIneqComp}.

Cite

Text

Zhao et al. "Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhao et al. "Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhao2025neurips-ineqcomp/)

BibTeX

@inproceedings{zhao2025neurips-ineqcomp,
  title     = {{Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities}},
  author    = {Zhao, Haoyu and Geng, Yihan and Tang, Shange and Lin, Yong and Lyu, Bohan and Lin, Hongzhou and Jin, Chi and Arora, Sanjeev},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhao2025neurips-ineqcomp/}
}