DP-PCA: Statistically Optimal and Differentially Private PCA

Abstract

We study the canonical statistical task of computing the principal component from i.i.d.~data under differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: ($i$) even for Gaussian data, existing private algorithms require the number of samples $n$ to scale super-linearly with $d$, i.e., $n=\Omega(d^{3/2})$, to obtain non-trivial results while non-private PCA requires only $n=O(d)$, and ($ii$) existing techniques suffer from a large error even when the variance in each data point is small. We propose DP-PCA method that uses a single-pass minibatch gradient descent style algorithm to overcome the above limitations. For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=O(d \log d)$.

Cite

Text

Liu et al. "DP-PCA: Statistically Optimal and Differentially Private PCA." Neural Information Processing Systems, 2022.

Markdown

[Liu et al. "DP-PCA: Statistically Optimal and Differentially Private PCA." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/liu2022neurips-dppca/)

BibTeX

@inproceedings{liu2022neurips-dppca,
  title     = {{DP-PCA: Statistically Optimal and Differentially Private PCA}},
  author    = {Liu, Xiyang and Kong, Weihao and Jain, Prateek and Oh, Sewoong},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/liu2022neurips-dppca/}
}