Conformalised Data Synthesis
Abstract
With the proliferation of increasingly complicated Deep Learning architectures, data synthesis is a highly promising technique to address the demand of data-hungry models. However, reliably assessing the quality of a ‘synthesiser’ model’s output is an open research question with significant associated risks for high-stake domains. To address this challenge, we propose a unique synthesis algorithm that generates data from high-confidence feature space regions based on the Conformal Prediction framework. We support our proposed algorithm with a comprehensive exploration of the core parameter’s influence, an in-depth discussion of practical advice, and an extensive empirical evaluation of five benchmark datasets. To show our approach’s versatility on ubiquitous real-world challenges, the datasets were carefully selected for their variety of difficult characteristics: low sample count, class imbalance, and non-separability. In all trials, training sets extended with our confident synthesised data performed at least as well as the original set and frequently significantly improved Deep Learning performance by up to 61% points F1\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\hbox {F}_1$\end{document}-score.
Cite
Text
Meister and Nguyen. "Conformalised Data Synthesis." Machine Learning, 2025. doi:10.1007/S10994-024-06701-0Markdown
[Meister and Nguyen. "Conformalised Data Synthesis." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/meister2025mlj-conformalised/) doi:10.1007/S10994-024-06701-0BibTeX
@article{meister2025mlj-conformalised,
title = {{Conformalised Data Synthesis}},
author = {Meister, Julia A. and Nguyen, Khuong An},
journal = {Machine Learning},
year = {2025},
pages = {57},
doi = {10.1007/S10994-024-06701-0},
volume = {114},
url = {https://mlanthology.org/mlj/2025/meister2025mlj-conformalised/}
}