Private Regression via Data-Dependent Sufficient Statistic Perturbation
Abstract
Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.
Cite
Text
Ferrando and Sheldon. "Private Regression via Data-Dependent Sufficient Statistic Perturbation." Transactions on Machine Learning Research, 2025.Markdown
[Ferrando and Sheldon. "Private Regression via Data-Dependent Sufficient Statistic Perturbation." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/ferrando2025tmlr-private/)BibTeX
@article{ferrando2025tmlr-private,
title = {{Private Regression via Data-Dependent Sufficient Statistic Perturbation}},
author = {Ferrando, Cecilia and Sheldon, Daniel},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/ferrando2025tmlr-private/}
}