HAPNEST: An Efficient Tool for Generating Large-Scale Genetics Datasets from Limited Training Data

Abstract

In this extended abstract we present a new highly efficient software tool called HAPNEST that enables machine learning practitioners to easily generate and evaluate large synthetic datasets for human genetics applications. HAPNEST enables the generation of diverse synthetic datasets from small, publicly accessible reference datasets. We demonstrate the suitability of HAPNEST-generated data for supervised tasks such as genetic risk scoring.

Cite

Text

Wharrie et al. "HAPNEST: An Efficient Tool for Generating Large-Scale Genetics Datasets from Limited Training Data." NeurIPS 2022 Workshops: SyntheticData4ML, 2022.

Markdown

[Wharrie et al. "HAPNEST: An Efficient Tool for Generating Large-Scale Genetics Datasets from Limited Training Data." NeurIPS 2022 Workshops: SyntheticData4ML, 2022.](https://mlanthology.org/neuripsw/2022/wharrie2022neuripsw-hapnest/)

BibTeX

@inproceedings{wharrie2022neuripsw-hapnest,
  title     = {{HAPNEST: An Efficient Tool for Generating Large-Scale Genetics Datasets from Limited Training Data}},
  author    = {Wharrie, Sophie and Yang, Zhiyu and Raj, Vishnu and Monti, Remo and Gupta, Rahul and Wang, Ying and Martin, Alicia and O'Connor, Luke J and Kaski, Samuel and Marttinen, Pekka and Palamara, Pier and Lippert, Christoph and Ganna, Andrea},
  booktitle = {NeurIPS 2022 Workshops: SyntheticData4ML},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/wharrie2022neuripsw-hapnest/}
}