RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

Miao, Chunyu; Zou, Henry Peng; Li, Yangning; Chen, Yankai; Wang, Yibo; Wang, Fangxin; Li, Yifan; Yang, Wooseong; He, Bowei; Zhang, Xinni; Yu, Dianzhi; Yang, Hanchen; Nguyen, Hoang H; Zhou, Yue; Yang, Jie; Guo, Jizhou; Fan, Wenzhe; Yeh, Chin-Yuan; Meng, Panpan; Fang, Liancheng; Qi, Jinhu; Huang, Wei-Chieh; Gu, Zhengyao; Han, Yuwei; He, Langzhou; Yang, Yuyao; Liu, Xue; King, Irwin; Yu, Philip S.

RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

ICLR 2026

/iclr/2026/miao2026iclr-recodeh/

Abstract

Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE-H, a benchmark of 102 tasks from research papers and repositories that evaluates LLMs through multi-turn interactions with human feedback. It includes structured instructions, unit tests, and a five-level feedback hierarchy to reflect realistic researcher–agent collaboration. We further present ReCodeAgent, a framework that integrates feedback into iterative code generation. Experimentswith leading LLMs, including GPT-5, Claude-Sonnet-4, DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer feedback, while also highlighting ongoing challenges in the generation of complex research code. RECODE-H establishes a foundation for developing adaptive, feedback-driven LLM agents in scientific research implementation.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Miao et al. "RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback." International Conference on Learning Representations, 2026.

Markdown

[Miao et al. "RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/miao2026iclr-recodeh/)

BibTeX

@inproceedings{miao2026iclr-recodeh,
  title     = {{RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback}},
  author    = {Miao, Chunyu and Zou, Henry Peng and Li, Yangning and Chen, Yankai and Wang, Yibo and Wang, Fangxin and Li, Yifan and Yang, Wooseong and He, Bowei and Zhang, Xinni and Yu, Dianzhi and Yang, Hanchen and Nguyen, Hoang H and Zhou, Yue and Yang, Jie and Guo, Jizhou and Fan, Wenzhe and Yeh, Chin-Yuan and Meng, Panpan and Fang, Liancheng and Qi, Jinhu and Huang, Wei-Chieh and Gu, Zhengyao and Han, Yuwei and He, Langzhou and Yang, Yuyao and Liu, Xue and King, Irwin and Yu, Philip S.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/miao2026iclr-recodeh/}
}