CLEVER: A Curated Benchmark for Formally Verified Code Generation
Abstract
We introduce ${\rm C{\small LEVER}}$, a high-quality, manually curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification. Unlike prior benchmarks,${\rm C{\small LEVER}}$ avoids test-case supervision, LLM-generated annotations, and specifications that leak implementation logic or allow vacuous solutions. All outputs are verified post-hoc using Lean's type checker to ensure machine-checkable correctness. We use ${\rm C{\small LEVER}}$ to evaluate several few-shot and agentic approaches based on state-of-the-art language models. These methods all struggle to achieve full verification, establishing it as a challenging frontier benchmark for program synthesis and formal reasoning. Our benchmark can be found on [GitHub](https://github.com/trishullab/clever) as well as [HuggingFace](https://huggingface.co/datasets/amitayusht/clever). All our evaluation code is also available [online](https://github.com/trishullab/clever-prover).
Cite
Text
Thakur et al. "CLEVER: A Curated Benchmark for Formally Verified Code Generation." Advances in Neural Information Processing Systems, 2025.Markdown
[Thakur et al. "CLEVER: A Curated Benchmark for Formally Verified Code Generation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/thakur2025neurips-clever/)BibTeX
@inproceedings{thakur2025neurips-clever,
title = {{CLEVER: A Curated Benchmark for Formally Verified Code Generation}},
author = {Thakur, Amitayush and Lee, Jasper and Tsoukalas, George and Sistla, Meghana and Zhao, Matthew and Zetzsche, Stefan and Durrett, Greg and Yue, Yisong and Chaudhuri, Swarat},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/thakur2025neurips-clever/}
}