Analysis of Differentially Private Synthetic Data: A Measurement Error Approach
Abstract
Differentially private (DP) synthetic datasets have been receiving significant attention from academia, industry, and government. However, little is known about how to perform statistical inference using DP synthetic datasets. Naive approaches that do not take into account the induced uncertainty due to the DP mechanism will result in biased estimators and invalid inferences. In this paper, we present a class of maximum likelihood estimator (MLE)-based easy-to-implement bias-corrected DP estimators with valid asymptotic confidence intervals (CI) for parameters in regression settings, by establishing the connection between additive DP mechanisms and measurement error models. Our simulation shows that our estimator has comparable performance to the widely used sufficient statistic perturbation (SSP) algorithm in some scenarios but with the advantage of releasing a synthetic dataset and obtaining statistically valid asymptotic CIs, which can achieve better coverage when compared to the naive CIs obtained by ignoring the DP mechanism.
Cite
Text
Jiang et al. "Analysis of Differentially Private Synthetic Data: A Measurement Error Approach." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30114Markdown
[Jiang et al. "Analysis of Differentially Private Synthetic Data: A Measurement Error Approach." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/jiang2024aaai-analysis/) doi:10.1609/AAAI.V38I19.30114BibTeX
@inproceedings{jiang2024aaai-analysis,
title = {{Analysis of Differentially Private Synthetic Data: A Measurement Error Approach}},
author = {Jiang, Yangdi and Liu, Yi and Yan, Xiaodong and Charest, Anne-Sophie and Kong, Linglong and Jiang, Bei},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {21206-21213},
doi = {10.1609/AAAI.V38I19.30114},
url = {https://mlanthology.org/aaai/2024/jiang2024aaai-analysis/}
}