CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

Abstract

Large Language Models (LLMs) have revolutionized code generation but are require significant resources and tend to over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs is a cost-effective alternative, yet standard supervised approaches rely solely on correct examples, overlooking valuable insights from failures. We introduce CodeLutra, a new framework that leverages both correct and incorrect code attempts. Instead of purely instructing with correct solutions, CodeLutra uses iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This process narrows the performance gap with state-of-the-art, larger models, without requiring massive datasets or auxiliary models. For example, on a challenging data science coding task, using only 500 samples improved Llama-3-8B’s accuracy from 28.2% to 48.6%, approaching GPT-4’s level. By capitalizing on both successes and mistakes, \textsc{CodeLutra} offers a scalable, efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.

Cite

Text

Tao et al. "CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement." Transactions on Machine Learning Research, 2025.

Markdown

[Tao et al. "CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/tao2025tmlr-codelutra/)

BibTeX

@article{tao2025tmlr-codelutra,
  title     = {{CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement}},
  author    = {Tao, Leitian and Chen, Xiang and Yu, Tong and Mai, Tung and Rossi, Ryan A. and Li, Yixuan and Mitra, Saayan},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/tao2025tmlr-codelutra/}
}