Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency
Abstract
We study the canonical problem of linear regression under $(\varepsilon,\delta)$-differential privacy when the datapoints are sampled i.i.d.~from a distribution and a fraction of response variables are adversarially corrupted. We provide the first provably efficient -- both computationally and statistically -- method for this problem, assuming standard assumptions on the data distribution. Our algorithm is a variant of the popular differentially private stochastic gradient descent (DP-SGD) algorithm with two key innovations: a full-batch gradient descent to improve sample complexity and a novel adaptive clipping to guarantee robustness. Our method requires only linear time in input size, and still matches the information theoretical optimal sample complexity up to a data distribution dependent condition number factor. Interestingly, the same algorithm, when applied to a setting where there is no adversarial corruption, still improves upon the existing state-of-the-art and achieves a near optimal sample complexity.
Cite
Text
Liu et al. "Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency." Neural Information Processing Systems, 2023.Markdown
[Liu et al. "Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/liu2023neurips-label/)BibTeX
@inproceedings{liu2023neurips-label,
title = {{Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency}},
author = {Liu, Xiyang and Jain, Prateek and Kong, Weihao and Oh, Sewoong and Suggala, Arun},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/liu2023neurips-label/}
}