Grounding Code Generation with Input-Output Specifications

Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri, Alex Polozov

NeurIPSW 2023

/neuripsw/2023/wen2023neuripsw-grounding/

Abstract

Large language models (LLMs) have demonstrated significant potential in code generation. However, the code generated by these models occasionally deviates from the user's intended outcome, resulting in executable but incorrect code. To mitigate this issue, we propose Gift4Code, a novel approach for the instruction fine-tuning of LLMs specifically tailored for code generation. Our method leverages synthetic data produced by the LLM itself and utilizes execution-derived feedback as a key learning signal. This feedback, in the form of program input-output specifications, is provided to the LLM to facilitate fine-tuning. We evaluated our approach on two challenging data science benchmarks, Arcade and DS-1000. Our results suggest that the method enhances the LLM's alignment with user intentions, reducing the incidence of executable but incorrect outputs.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Wen et al. "Grounding Code Generation with Input-Output Specifications." NeurIPS 2023 Workshops: Instruction, 2023.

Markdown

[Wen et al. "Grounding Code Generation with Input-Output Specifications." NeurIPS 2023 Workshops: Instruction, 2023.](https://mlanthology.org/neuripsw/2023/wen2023neuripsw-grounding/)

BibTeX

@inproceedings{wen2023neuripsw-grounding,
  title     = {{Grounding Code Generation with Input-Output Specifications}},
  author    = {Wen, Yeming and Yin, Pengcheng and Shi, Kensen and Michalewski, Henryk and Chaudhuri, Swarat and Polozov, Alex},
  booktitle = {NeurIPS 2023 Workshops: Instruction},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/wen2023neuripsw-grounding/}
}