Algorithmically Effective Differentially Private Synthetic Data

Abstract

We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon d n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for $d\geq 2$.

Cite

Text

He et al. "Algorithmically Effective Differentially Private Synthetic Data." Conference on Learning Theory, 2023.

Markdown

[He et al. "Algorithmically Effective Differentially Private Synthetic Data." Conference on Learning Theory, 2023.](https://mlanthology.org/colt/2023/he2023colt-algorithmically/)

BibTeX

@inproceedings{he2023colt-algorithmically,
  title     = {{Algorithmically Effective Differentially Private Synthetic Data}},
  author    = {He, Yiyun and Vershynin, Roman and Zhu, Yizhe},
  booktitle = {Conference on Learning Theory},
  year      = {2023},
  pages     = {3941-3968},
  volume    = {195},
  url       = {https://mlanthology.org/colt/2023/he2023colt-algorithmically/}
}