Discriminator Based Corpus Generation for General Code Synthesis

Abstract

Current work on neural code synthesis consists of increasingly sophisticated architectures being trained on highly simplified domain-specific languages, using uniform sampling across program space of those languages for training. By comparison, program space for a C-like language is vast, and extremely sparsely populated in terms of `useful' functionalities; this requires a far more intelligent approach to corpus generation for effective training. We use a genetic programming approach using an iteratively retrained discriminator to produce a population suitable as labelled training data for a neural code synthesis architecture. We demonstrate that use of a discriminator-based training corpus generator, trained using only unlabelled problem specifications in classic Programming-by-Example format, greatly improves network performance compared to current uniform sampling techniques.

Cite

Text

Wild and Porter. "Discriminator Based Corpus Generation for General Code Synthesis." International Conference on Learning Representations, 2020.

Markdown

[Wild and Porter. "Discriminator Based Corpus Generation for General Code Synthesis." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/wild2020iclr-discriminator/)

BibTeX

@inproceedings{wild2020iclr-discriminator,
  title     = {{Discriminator Based Corpus Generation for General Code Synthesis}},
  author    = {Wild, Alexander and Porter, Barry},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/wild2020iclr-discriminator/}
}