EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Chi, Wayne; Chen, Valerie; Shar, Ryan; Mittal, Aditya; Liang, Jenny; Chiang, Wei-Lin; Angelopoulos, Anastasios Nikolas; Stoica, Ion; Neubig, Graham; Talwalkar, Ameet; Donahue, Chris

EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Wayne Chi, Valerie Chen, Ryan Shar, Aditya Mittal, Jenny Liang, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Ion Stoica, Graham Neubig, Ameet Talwalkar, Chris Donahue

ICLR 2026

/iclr/2026/chi2026iclr-editbench/

Abstract

Instructed code editing, where LLMs directly modify a developer's existing code based on a user instruction, is becoming a widely used interaction mode in AI coding assistants. However, few benchmarks directly evaluate this capability and current datasets often rely on artificial sources. We introduce EditBench, a benchmark for evaluating LLM code editing capabilities grounded in real-world usage, i.e.,~user instructions and code contexts collected in the wild. EditBench comprises of 545 problems, multiple natural and programming languages, and a diverse set of real-world use cases, ranging from resolving errors to adding features. EditBench introduces context-dependent problems that require the model to understand code context, highlighted code, and cursor position in addition to the user instruction. We evaluate 40 diverse LLMs and observe that EditBench is a challenging set of problems where only 3 models score over 60\%. We find that model performance varies across different categories of user instructions. Further, we find that varying levels of contextual information greatly affect task success rate, with performance varying up to 11\%, indicating the importance of evaluating with realistic context.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Chi et al. "EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits." International Conference on Learning Representations, 2026.

Markdown

[Chi et al. "EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chi2026iclr-editbench/)

BibTeX

@inproceedings{chi2026iclr-editbench,
  title     = {{EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits}},
  author    = {Chi, Wayne and Chen, Valerie and Shar, Ryan and Mittal, Aditya and Liang, Jenny and Chiang, Wei-Lin and Angelopoulos, Anastasios Nikolas and Stoica, Ion and Neubig, Graham and Talwalkar, Ameet and Donahue, Chris},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chi2026iclr-editbench/}
}