Differentially Private Data Release for Mixed-Type Data via Latent Factor Models

Abstract

Differential privacy is a particular data privacy-preserving technology which enables synthetic data or statistical analysis results to be released with a minimum disclosure of private information from individual records. The tradeoff between privacy-preserving and utility guarantee is always a challenge for differential privacy technology, especially for synthetic data generation. In this paper, we propose a differentially private data synthesis algorithm for mixed-type data with correlation based on latent factor models. The proposed method can add a relatively small amount of noise to synthetic data under a given level of privacy protection while capturing correlation information. Moreover, the proposed algorithm can generate synthetic data preserving the same data type as mixed-type original data, which greatly improves the utility of synthetic data. The key idea of our method is to perturb the factor matrix and factor loading matrix to construct a synthetic data generation model, and to utilize link functions with privacy protection to ensure consistency of synthetic data type with original data. The proposed method can generate privacy-preserving synthetic data at low computation cost even when the original data is high-dimensional. In theory, we establish differentially private properties of the proposed method. Our numerical studies also demonstrate superb performance of the proposed method on the utility guarantee of the statistical analysis based on privacy-preserved synthetic data.

Cite

Text

Zhang et al. "Differentially Private Data Release for Mixed-Type Data via Latent Factor Models." Journal of Machine Learning Research, 2024.

Markdown

[Zhang et al. "Differentially Private Data Release for Mixed-Type Data via Latent Factor Models." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/zhang2024jmlr-differentially/)

BibTeX

@article{zhang2024jmlr-differentially,
  title     = {{Differentially Private Data Release for Mixed-Type Data via Latent Factor Models}},
  author    = {Zhang, Yanqing and Xu, Qi and Tang, Niansheng and Qu, Annie},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-37},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/zhang2024jmlr-differentially/}
}