DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Abstract

We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as Numpy and Pandas. Compared to prior works, DS-1000 incorporates three core features. First, our problems reflect diverse, realistic, and practical use cases since we collected them from StackOverflow. Second, our automatic evaluation is highly specific (reliable) – across all Codex-002-predicted solutions that our evaluation accepts, only 1.8% of them are incorrect; we achieve this with multi-criteria metrics, checking both functional correctness by running test cases and surface-form constraints by restricting API usages or keywords. Finally, we proactively defend against memorization by slightly modifying our problems to be different from the original StackOverflow source; consequently, models cannot answer them correctly by memorizing the solutions from pre-training. The current best public system (Codex-002) achieves 43.3% accuracy, leaving ample room for improvement. We release our benchmark at https://ds1000-code-gen.github.io.

Cite

Text

Lai et al. "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation." International Conference on Machine Learning, 2023.

Markdown

[Lai et al. "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/lai2023icml-ds1000/)

BibTeX

@inproceedings{lai2023icml-ds1000,
  title     = {{DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation}},
  author    = {Lai, Yuhang and Li, Chengxi and Wang, Yiming and Zhang, Tianyi and Zhong, Ruiqi and Zettlemoyer, Luke and Yih, Wen-Tau and Fried, Daniel and Wang, Sida and Yu, Tao},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {18319-18345},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/lai2023icml-ds1000/}
}