Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency

Abstract

We study the canonical problem of linear regression under $(\varepsilon,\delta)$-differential privacy when the datapoints are sampled i.i.d.~from a distribution and a fraction of response variables are adversarially corrupted. We provide the first provably efficient -- both computationally and statistically -- method for this problem, assuming standard assumptions on the data distribution. Our algorithm is a variant of the popular differentially private stochastic gradient descent (DP-SGD) algorithm with two key innovations: a full-batch gradient descent to improve sample complexity and a novel adaptive clipping to guarantee robustness. Our method requires only linear time in input size, and still matches the information theoretical optimal sample complexity up to a data distribution dependent condition number factor. Interestingly, the same algorithm, when applied to a setting where there is no adversarial corruption, still improves upon the existing state-of-the-art and achieves a near optimal sample complexity.

Cite

Text

Liu et al. "Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency." Neural Information Processing Systems, 2023.

Markdown

[Liu et al. "Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/liu2023neurips-label/)

BibTeX

@inproceedings{liu2023neurips-label,
  title     = {{Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency}},
  author    = {Liu, Xiyang and Jain, Prateek and Kong, Weihao and Oh, Sewoong and Suggala, Arun},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/liu2023neurips-label/}
}