Benchmarking Differentially Private Tabular Data Synthesis Algorithms
Abstract
Differentially private (DP) tabular data synthesis algorithms generate artificial data that preserves the statistical properties of private data while safeguarding individual privacy. However, the emergence of diverse algorithms in recent years has introduced challenges in practical applications, such as inconsistent data processing methods, and the lack of in-depth algorithm comparisons and analysis. These factors create significant obstacles to selecting appropriate algorithms. In this paper, we address these challenges by proposing a novel benchmark for evaluating tabular data synthesis methods. We present a unified evaluation framework that integrates data preprocessing, feature selection, and data synthesis modules, facilitating fair and comprehensive comparisons. Our evaluation reveals that no single method consistently outperforms the rest across all scenarios. Furthermore, we conduct an in-depth experimental evaluation of each algorithmic module, offering insights into the strengths and limitations of different strategies. This lays the foundation for designing more robust and interpretable methods for private data synthesis. Source codes are available at the anonymous link\footnote{\url{https://anonymous.4open.science/r/tab_bench-DE92/}}.
Cite
Text
Chen et al. "Benchmarking Differentially Private Tabular Data Synthesis Algorithms." ICLR 2025 Workshops: SynthData, 2025.Markdown
[Chen et al. "Benchmarking Differentially Private Tabular Data Synthesis Algorithms." ICLR 2025 Workshops: SynthData, 2025.](https://mlanthology.org/iclrw/2025/chen2025iclrw-benchmarking/)BibTeX
@inproceedings{chen2025iclrw-benchmarking,
title = {{Benchmarking Differentially Private Tabular Data Synthesis Algorithms}},
author = {Chen, Kai and Li, Xiaochen and Gong, Chen and McKenna, Ryan and Wang, Tianhao},
booktitle = {ICLR 2025 Workshops: SynthData},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/chen2025iclrw-benchmarking/}
}